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Abstract 

This article presents a new class of “optimal transportation”-like dis¬ 
tances between arbitrary positive Radon measures. These distances are 
defined by two equivalent alternative formulations: (i) a “fluid dynamic” 
formulation defining the distance as a geodesic distance over the space 
of measures (ii) a static “Kantorovich” formulation where the distance is 
the minimum of an optimization program over pairs of couplings describ¬ 
ing the transfer (transport, creation and destruction) of mass between 
two measures. Both formulations are convex optimization problems, and 
the ability to switch from one to the other depending on the targeted 
application is a crucial property of our models. Of particular interest 
is the Wasserstein-Fisher-Rao metric recently introduced independently 
by [ . 15, K ]. Defined initially through a dynamic formulation, 

it belongs to this class of metrics and hence automatically benefits from a 
static Kantorovich formulation. Switching from the initial Eulerian expres¬ 
sion of this metric to a Lagrangian point of view provides the generalization 
of Otto’s Riemannian submersion to this new setting, where the group of 
diffeomorphisms is replaced by a semi-direct product of groups. This Rie¬ 
mannian submersion enables a formal computation of the sectional curva¬ 
ture of the space of densities and the formulation of an equivalent Monge 
problem. 


1 Introduction 

Optimal transport is an optimization problem which gives rise to a popular 
class of metrics between probability distributions. We refer to the monograph 
of Villani [Vil03] for a detailed overview of optimal transport. A major con¬ 
straint of the resulting transportation metrics is that they are restricted to mea¬ 
sures of equal total mass (e.g. probability distributions). In many applications, 
there is however a need to compare unnormalized measures, which corresponds 
to so-called “unbalanced” transportation problems, following the terminology 
introduced in [ >en03]. Applications of these unbalanced metrics range from 
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image classification [ GT97, PW08] to the processing of neuronal activation 
maps [< ]. This class of problems requires to precisely quantify the amount 

of transportation, creation and destruction of mass needed to compare arbitrary 
positive measures. While several proposals to achieve this goal have been made 
in the literature (see below for more details), to the best of our knowledge, there 
lacks a coherent framework that enables to deal with generic measures while pre¬ 
serving both the dynamic and the static perspectives of optimal transport. It 
is precisely the goal of the present paper to describe such a framework and to 
explore its main properties. 

1.1 Previous Work 

In the last few years, there has been an increasing interest in extending optimal 
transport to the unbalanced setting of measures having non-equal masses. 

Dynamic formulations of unbalanced optimal transport. Several mod¬ 
els based on the fluid dynamic formulation introduced in [ 1300] have been pro¬ 
posed recently [ SI5, LM13, PR14, PR13]. In these works, a source term is 
introduced in the continuity equation. They differ in the way this source is pe¬ 
nalized or chosen. We refer to [ SPV15] for a detailed overview of these models. 

Static formulations of unbalanced optimal transport. Purely static 
formulations of unbalanced transport are however a longstanding problem. A 
simple way to address this issue is given in the early work of Kantorovich and 
Rubinstein [ ]. The corresponding “Kantorovich norms” were later extended 

to separable metric spaces by [Han99]. These norms handle mass variations 
by allowing to drop some mass from each location with a fixed transportation 
cost. The computation of these norms can in fact be re-casted as an ordinary 
optimal transport between normalized measures by adding a point “at infinity” 
where mass can be sent to, as explained by [Gui02]. This reformulation is used 
in [ ] for applications in neuroimaging. A related approach is the so-called 

optimal partial transport. It was initially proposed in the computer vision liter¬ 
ature to perform image retrieval [ GT97, PV ], while its mathematical prop¬ 
erties are analyzed in detail by [CM10, FiglO] . As noted in [ ] and re¬ 

called in Section 5.1, optimal partial transport is tightly linked to the generalized 
transport proposed in [ ’R14, PR13] which allows a dynamic formulation of the 
optimal partial transport problem. The contributions in [ ] were inspired by 

[Bcn( | where it is proposed to relax the marginal constraints and to add an L 2 
penalization term instead. 

Wasserstein-Fisher-Rao metric and relation with recent work. A new 

metric between measures of non equal masses has recently and independently 
been proposed by [('SPV15, K4I\ ]. This new metric interpolates between the 
Wasserstein W 2 and the Fisher-Rao metrics. It is defined through a dynamic 
formulation, corresponding formally to a Riemannian metric on the space of 
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measures, which generalizes the formulation of optimal transport due to Benamou 
and Brenier [BBOO]. 

In [ ’V15], we proved existence of minimizers in a general setting, pre¬ 

sented the limit models (for extreme values of mass creation/destruction cost) 
and proposed a numerical scheme based on first order proximal splitting methods. 
We also thoroughly treated the case of two Dirac masses which was the first step 
towards a Lagrangian description of the model. This metric is the prototypical 
example for the general framework developed in this article. It thus enjoys both 
a dynamic formulation and a static one (Sect. 5.2). It is also sufficiently simple so 
that its geometry (and in particular curvature) can be analyzed in detail, which 
is useful to get a deep insight about the properties of our generic class of metrics. 

1.2 Contribution 

The starting point of this article is the Wasserstein-Fisher-Rao ( WF ) metric. For 
two non-negative densities po, p\ on a domain 12 C it is informally obtained 
by optimizing 

WF 2 (p 0 , Pl )= inf [ f (hv(t,x)\ 2 + l-a(t,x)A p(t,x)dxdt (1.1) 

(P,v,a) J o id 1 / 

where p is a time-dependent density, v is a velocity held that describes the move¬ 
ment of mass of p and a a scalar held that models local growth and destruction 
of mass. The triplet ( p , v, a) must satisfy the following continuity equation with 
source: 


d t p + V • (pv) = pa, p(0,-)=p 0 , p(l,-)=pi. (1.2) 

This article presents two sets of contributions. The hrst one (Section 2) 
studies in detail the geometry of the WF metric and related functionals. It is 
both of independent interest and serves as a motivation to introduce the second 
class of contributions. Formula (1.1) is formally interpreted as a particular case 
of a family of Riemannian metrics on a semi-direct product of groups between 
diffeomorphisms and scalar functions. The diffeomorphisms account for mass 
transportation, the scalar helds act as pointwise mass multipliers. The main 
result of this part is the formal derivation of a submersion of this semi-direct 
product of groups into the space of positive measures, equipped with a metric 
from this family (Proposition 2.9). This corresponds to a generalization to WF 
of the Riemannian submersion hrst introduced by Otto in the optimal transport 
case [ ]. A hrst application of this result is the computation of the sectional 

curvature of the WF space (Proposition 2.15). A second application is the def¬ 
inition of a Monge-like formulation, i.e. the computation of WF in terms of a 
transport diffeomorphism and a pointwise mass multiplier (Section 2.7). 

The second set of contributions introduces and studies a general class of 
unbalanced optimal transport metrics, which enjoy both a static and a dynamic 
formulation. We introduce in Section 3 a new Kantorovich-like class of static 
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problems of the form 


Ck(po,Pi)= inf / c(x, 7o(x, y), y, 71 (x, y)) dx dy (1.3) 

( 70 , 71 ) JflxQ 

where (70,71) are two ‘semi-couplings’ between po and pi, describing analogously 
to standard optimal transport, how much mass is transported between any pair 
x, y € fh Two semi-couplings are required, to be able to describe changes of 
mass during transport. The function c(xo,mo,xi,mi) determines the cost of 
transporting a quantity of mass m 0 from xq to a (possibly different) quantity 
mi at x\. It is a crucial assumption of our approach that c(x,-,y,-) is jointly 
positively 1-homogeneous and convex in the two mass arguments. This ensures 
that (1.3) can be rigorously defined as an optimization problem over measures 
and that the resulting problem is convex. A continuity result and the dual 
problem are established (Theorems 3.3 and 3.5). Analogous to standard optimal 
transport, when c induces a metric over pairs of location and mass, then (1.3) 
defines a metric over non-negative measures (Theorem 3.2). 

Then, in Section 4 we look at a family of dynamic problems given by 

Cc(po,Pi)= inf / / f(x,p(t,x),v(t,x)-p(t,x),a(t,x)-p(t,x))dxdt 

(P,v,a) J o Jq 

(1.4) 

where the infimum is again taken over solutions of (1.2). Here, f{x, m,v-m , a-m) 
gives the infinitesimal cost of moving a chunk of mass m at x in direction v while 
undergoing an infinitesimal scaling by a. Note that in the two last arguments of / 
we multiply v and a by m. This corresponds to the velocity momentum change 
of variables proposed in [ ] to obtain a convex problem. Under suitable 

assumptions on / (that include (1.1) aud go beyond the Riemannian case) we 
establish equivalence between (1.3) and (1.4) when c is chosen to be the ‘pointwise 
distance’ induced by / (Theorem 4.3 and Proposition 4.4). 

Finally, we apply those results to two unbalanced optimal transport models. 
Section 5.1 introduces a dynamic formulation and gives duality results for a 
family of metrics obtained from the optimal partial transport problem. This is 
reminiscent of — and generalizes — the results in [ ’R13, PR ]. The case of 
the WF metric is rigorously discussed in Section 5.2. In Section 5.3 it is shown 
how standard static optimal transport is obtained as a limit (in the sense of P- 
convergence) of the WF metric, thus complementing a previous result of [ | 

obtained for dynamic formulations. 

While being motivated by Section 2, Sections 3 to 5 are mathematically self- 
contained. Readers not familiar with infinite dimensional Riemannian geometry 
do therefore not necessarily need to read Section 2 before proceeding to the rest 
of the paper. 

1.3 Relation with [LMSl5a, 4MS151 ] 

After completing this paper, we became aware of the independent work of [ >1 5a, 

j. In these two papers the authors develop and study the same class of 
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“static” transportation-like problems as here. This huge body of work contains 
many theoretical aspects that we do not cover. For instance measures defined 
over more general metric spaces are considered, while we work over M d . The con¬ 
struction of [ IS 15a] defines three equivalent static formulations. Their third 
“homogeneous” formulation is closely related (by a change of variables) to our 
“semi-couplings” formulation. Their first formulation gives an intuitive and nice 
interpretation of this class of convex programs as a modification of the original op¬ 
timal transportation problem where one replaces the hard marginal constraints by 
soft penalization using Bregman divergences. The dual of this first formulation is 
related to the dual of our formulation by a logarithmic change of variables. Quite 
interestingly, the same idea is used in an informal and heuristic way by [ ] 

for applications in machine learning, where soft marginal constraints is the key 
to stabilize numerical results. The authors of [ SI 5a,, LMS15b] study dy¬ 
namical formulations in the Wasserstein-Fisher-Rao setting (that they call the 
“Hellinger-Kantorovich” problem). This allows them to make a detailed analysis 
of the geodesic structure of this space. In contrast, we study a more general 
class of dynamical problems, but restrict our attention to the equivalence with 
the static problem. Another original contribution of our work is the proof of the 
metric structure (in particular the triangular inequality) for static and dynamic 
formulations when the underlying cost over the cone manifold If X M + is related 
to a distance. Lastly, our geometric study of Section 2, and in particular the 
Riemannian submersion structure, the explicit sectional curvature computation 
and the Monge problem appear to be original contributions. Note that along 
these lines, the work of [ ] proves a lower bound on the Alexandrov cur¬ 

vature in the WF case, which in particular allows these authors to state sufficient 
condition for the WF space to have positive curvature. In a smooth setting, we 
find similar results in Proposition 2.15 and Corollary 2.16. 

1.4 Preliminaries and Notation 

We denote by C(X) the Banach space of real valued continuous functions on a 
compact set X C M. d endowed with the sup norm topology. Its topological dual is 
identified with the set of Radon measures, denoted by Ai(X) and the dual norm 
on Ai(X) is the total variation, denoted by | • \tv- Another useful topology on 
Ai(X) is the weak* topology arising from this duality: a sequence of measures 
(/x n ) ne pj weak* converges towards p € Xi(X) if and only if for all u £ C(X), 
lim n ^ +00 j x ud/j n = f x u d/x. According to that topology, C{X) and M(X) are 
topologically paired spaces (the elements of each space can be identified with 
the continuous linear forms on the other), this is a standard setting in convex 
analysis. We also use the following notations: 

• Ai+(X) is the space of nonnegative Radon measures, .A4“ C (A), the subset 
of absolutely continuous measures w.r.t. the Lebesgue measure and M c f(X) 
the subset of purely atomic measures. 

• For M a given manifold, TM denotes the tangent bundle of M and T p M 
the tangent space at a point p £ M. 
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• Dens(X) the set of finite Radon measures that have smooth positive density 
w.r.t. a reference volume form zq 

• fj, <C v means that the M m -valued measure p is absolutely continuous w.r.t. 
the positive measure v. We denote by ^ € (L 1 (X, v)) m the density of p 
with respect to v. 

• For a (possibly vector) measure p, \p\ E A / 1+(X) is its variation; 

• For a map p : M N between two manifolds M,N, Tp denotes the 

tangent map of p. For a given Riemannian metric g on N, the pull¬ 
back of g by p is denoted by p*g and defined by (p*g)(x)(v x , v x ) = 
g(p{x)){T x p(v x ),T x p(v x )). 

• Tjtpb is the image measure of p through the measurable map T : X\ —>• 
X 2 , also called the pushforward measure. It is given by T#p(A 2 ) d = 
p(T~ 1 (A 2 )); When X\ = X 2 is a manifold and T is a diffeomorphism, 
we denote T# by T*. 

• ^ is a Dirac measure of mass 1 located at the point x\ 

• ic is the (convex) indicator function of a convex set C which takes the value 
0 on C and +00 everywhere else; 

• If (E, E r ) are topologically paired spaces and / : E —» M U {+ 00 } is a 
convex function, f* is its Legendre transform i.e. for y E E\ f*{y) = 

sup xeE (x,y) ~ f(x). 

• For n € N and a tuple of distinct indices (ii,..., i^), ii € {0,..., n — 1} the 
map 

I’roj,,. 4 . : Q n ^ Q k (1.5) 

denotes the canonical projection from O n onto the factors given by the 
tuple (*i,.. .,i k ). 

• The truncated cosine is defined by cos : z 1 —> cos(|z| A ^). 

2 The Geometric Formulation in a Riemannian setting 

This section is focussed on Riemannian generalizations of the Wasserstein-Fisher- 
Rao (WF) metric. The WF distance, as informally defined in (1.1) over the space 
of Radon measures on Q is the motivating example for the geometric formulation 
of Section 2 and also a simple example for which an equivalent static formulation 
exists in the setting of Section 5.2. 

This is a prototypical example of metrics over densities that can be written 
as 

G 2 (P 0 ,Pi) = inf / (\ [ g(x)((v,a),(v,a))dp t {x)\dt (2.1) 

P’ v ’ a J 0 \ 2 J<n J 
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under the same constraints where g{x) is a scalar product on T x £l X R where the 
two factors v = v(t, x ) and a = a(t, x ) represent the velocity field and the growth 
rate. Note that WF is obtained from (2.1) by choosing g(x)((v,a),(v,a)) = 
2 /(1, v, a) = \v\ 2 + S 2 a 2 . We will see in Section 2.2 that this family of metrics 
is exactly all the Riemannian metrics that satisfies the homogeneity condition 
formulated in Definition 2.1. This homogeneity condition appears compulsory in 
order to properly define the metric on the space of Radon measure via means of 
convex analysis (see Section 4). 

While Section 1.2 defines the WF metric over a bounded domain C R rf , we 
will assume in the rest of the section that D is a compact manifold, possibly with 
smooth boundary. 

2.1 Otto’s Riemannian Submersion: Eulerian and Lagrangian 
Formulations 

Standard optimal transport consists of moving one distribution of mass to an¬ 
other while minimizing a transportation cost, which is an optimization problem 
originally formulated in Lagrangian (static) coordinates. In [ 1B00], the authors 
introduced a convex Eulerian formulation (dynamic) which enables the natural 
generalization proposed in [ V15, I' j. The link between the static and 

dynamic formulation is made clear using Otto’s Riemannian submersion [ )tt01] 
which emphasizes the idea of a group action on the space of probability densi¬ 
ties. More precisely, let Q be a compact manifold and Diff(fl) be the group of 
smooth diffeomorphisms of D and Dens p (D) be the set of probability measures 
that have smooth positive density with respect to a reference volume measure v. 
We consider such a probability density denoted by po. Otto proved that the map 


n : Diff(fi) -)• DenSp(D) 
n(^) = p*po 

is a Riemannian submersion of the metric L 2 (po) on Diff(O) to the Wasserstein 
metric on Dens p (D). Therefore, the geodesic problem on Dens p (fl) can be 
reformulated on the group Diff(D) as the Monge problem, 

W 2 {po,pi) 2 =' inf { [ \\y{x) - x\\ 2 p 0 (x)dis(x) : tp*p 0 = pi) . (2.2) 

^eDiff(Q) {Jn J 

For an overview on the geometric formulation of optimal transport, we refer the 
reader to [KW08] and to [Del09] for a more detailed presentation. 

2.2 Admissible Riemannian Metrics 

In our setting, where mass is not only moved but also changed, the group acting 
on a mass particle has to include a mass rescaling action in addition to the 
transport action. Let us introduce informally the Lagrangian formulation of the 
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continuity constraint with source associated to (1.2). Let m(t) 5 x m describe a 
particle of mass m{t) at point x[t). The continuity constraint with source reads 


\±x(t)= v (t,x(t)) 

1 = x (t)) m (f). 

This states that the mass is dragged along by the vector field v and simultane¬ 
ously undergoes a growth process at rate a. These equations also represent the 
infinitesimal action of the group which is described in more abstract terms in the 
next section. 

Another important object is the metric used to measure spatial and mass 
changes. Note that the metric g introduced in (2.1) defines a unique Riemannian 
metric on the product space x which transforms homogeneously under 
pointwise multiplication. Since the pointwise multiplication will be used, it is 
natural to consider this product space as a trivial principal fiber bundle where 
the structure group is under multiplication. In Section 4 we will prove an 
equivalence result between dynamic and static formulations for a general cost 
function (see Definition 4.2) which reduces, in the Riemannian case, to this type 
of metrics. We therefore define the following admissible class of metrics: 

Definition 2.1 (Admissible Riemannian metrics). A smooth Riemannian metric 
g on fl xM^ will be said to be admissible if the family of maps : (fl xK)., A g) —» 
(f l x M+,< 7 ), A > 0, defined by 'S/\(x,m) = (x, Am) are isometries. 

Under such a metric, the metric completion of Q x is the cone over D 
which we now define. 

Definition 2.2 (Cone). The cone over Q denoted by Cone(D) is the quotient 
space (Q x R + ) / (f2 x {0}). The apex of the cone Slx{0} will be denoted by 

5. 

Proposition 2.1. The metric completion of (D x Ml, g) for an admissible metric 
g is the cone Cone(fl). 

Proof. By the definition we have that d((x, m), (x, m/2)) = m 1//2 d((x, !),(*, 1/2))- 
Taking m = l/2 fc , we get 

d((x, 1), (x, 0))<£ d((x, 1/2*), (x, l/2 t+1 )) = 1), (x, 1/2)). 

(2.4) 

Moreover, d((x,m), (y,m)) = md{{x, 1), (y, 1)) therefore, (x,m) and ( y,m ) have 
the same limit when m goes to 0. This limit is the apex of the cone as defined in 
Definition 2.2. Then, the set S U (flx]0,mo]) is compact since D is assumed so. 

Now, consider a Cauchy sequence (x n ,m n ) for the distance induced by the ad¬ 
missible metric. It implies that m n is bounded above since d((x n , m n ), (x n , 0)) = 
m n d((x n , 1), (x n , 0)) and thus (x n , m n ) has an accumulation point in the cone. 

□ 



Rem.ark. Note that Definition 2.1 and Proposition 2.1 are also valid for Finsler 
metrics under minor changes. 

Proposition 2.2. Any admissible metric on Ll x R+ is completely defined by its 
restriction to D x {1}. There exist g a metric on Q, a £ T*Ll a l-form and b a 
positive function on Q such that 

dm 2 

g(x, m ) = m g(x ) + a(x) dm + b(x )-. (2-5) 

m 

Proof. For an admissible Riemannian metric one has 

A g(x,m)((v x ,v m ), (y x ,v m )) = g(x,\m)((v x ,\v m ), (v x ,\v m )) (2.6) 

for all A > 0, (x,m) £Slx R+ and (v x ,v m ) € T^ xm ^Ft x R+. As a consequence, 
we have that 

g(x,m)((v x ,v m ),(v x ,v m )) = mg(x, l)((v x ,v m /m), (v x ,v m /m)) . (2.7) 

Expanding the terms, 

g(x,m)((v x ,v m ), (v x ,v m )) = mg(x, l)((v x ,0), (v x ,0)) + 2g(x, l)((u x ,0), (0,v m ) 

+ —g(x,l)((0,v m ),(0,v m ), (2.8) 

m 

we obtain the desired decomposition and the fact that the admissible metric is 
completely defined by g(x, 1). □ 

Note that for a given 1-form a € T*Q and b a positive function on fl, the 
formula (2.5) defines a metric if and only if its determinant is everywhere positive. 
We will also use the short notation 

g(x)(v x ,a) =' g(x, l){(v x ,v m /m), (v x ,v m /m)) (2.9) 

with a = Vm/m as it was introduced in (2.1). 

Using a square root change of variables, this type of metrics can be related 
to (generalized) Riemannian cones. Recall that a Riemannian cone (see [Gal79, 
BBI01] for instance) on a Riemannian manifold is the manifold D X 

endowed with the cone metric g c = m 2 h + dm 2 . The change of variables T : 
(x,m) e -> (x,y/rn) gives T *g c = mh + j^dm 2 , which is the admissible metric 
associated with the initial Wasserstein-Fisher-Rao metric (1.1). This type of 
metrics is well-known and we summarize hereafter some important properties: 

Proposition 2.3. Let (Ll,g) be a complete Riemannian manifold and consider 
Q x with the admissible metric defined by mg+ -j^dm 2 for (x, m ) £flx R^_. 
For a given vector field X on D, define its lift on 0 X R^_ by X = (X , 0) and 
denote by e the vector field defined by . This Riemannian manifold has the 
following properties: 
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( 2 . 10 ) 


1. Its curvature tensor satisfies R(X,e) = 0 and 

R{X, Y)Z= (. R g (X , Y)Z - g(Y, Z)X + g(X, Z)Y , 0) 
where R g denotes the curvature tensor of (fit, g). 

2. The distance on Cone(fl) is 

d ((x 0 ,m 0 ), (xi,?ni)) = [m 0 + m± - 2y/m 0 mi cos (d(x 0 ,xi) A m )] ly/2 . 

( 2 . 11 ) 

Proof. The proof of the first point is in [ ] and the second point can be found 

in [ -BI 01 ]. Note that the square root change of variables 'h : (x,m) H > (x,y/m) 
is needed for the application of these results. □ 

Note that (as remarked in [ al79]), for any geodesic c on It parametrized 
with unit speed, the map <f> : C \ R _ —>• flx R?j_ defined by (j)(me l6 ) = ( c(9),m 2 ) 
is a local isometry. 

Corollary 2.4. If (ft, g) has sectional curvature greater than 1, then (flxR f,mg+ 
■^dm 2 ) has non-negative sectional curvature and more precisely for X,Y two or- 
thornormal vector fields on It, 

K(X,Y) = ±(K g (X,Y)~ 1) (2.12) 

m- 

where I\ and K g denote respectively the sectional curvatures of It x R+ and It. 

Although the Riemannian cone over a segment in R is locally flat, the curva¬ 
ture still concentrates at the apex of the cone. 

In view of applications, it is of practical interest to classify, at least locally, 
the space of admissible metrics. The first important remark is that the metric 
associated with the WF model on R of the form mg + dm 2 is flat. It is a 
particular case of admissible metrics that are diagonal, which we define hereafter. 

Definition 2.3. A diagonal admissible metric is a metric on 17 x R^_ that can 
be written as mg- 1- ^dm 2 where g and c are respectively a metric and a positive 
function on It. 

It is possible to exhibit admissible diagonal metrics that have non zero sec¬ 
tional curvature when It C R. Therefore, admissible diagonal metrics are not 
isometric to the standard Riemannian cone. The next proposition gives a char¬ 
acterization of admissible metrics that can be diagonalized by a fiber bundle 
isomorphism [ Vlic08, Section 17] and it shows that there is a correspondence be¬ 
tween diagonal metrics and exact 1-forms on It given by principal fiber bundle 
isomorphisms (see [Mic08, Section 18.6] for instance). 

Proposition 2.5. Any admissible metric h(x,m) = mh(x) + a(x)dm + b(x)^~ 
on It x R*j_ is the pull back of a diagonal admissible metric by a principal fiber 
bundle isomorphism, if and only if is an exact 1-form. More precisely, there 
exist positive functions c, A on It and a metric g on It such that (mg+f^dm 2 ) = 
h where = (x,X(x)t). 
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See Appendix B for a proof. In particular, the proof shows that in the space 
of admissible metrics, locally diagonalizable metrics are in correspondence with 
closed 1-forms. Thus, the space of admissible metrics is strictly bigger than the 
space of standard cone metrics and even strictly bigger than diagonal metrics. 
This statements are to be understood “up to fiber bundle isomorphisms” which 
respect the decomposition between space and mass. 

2.3 A Semi-direct Product of Groups 

As mentioned before, we denote by v a volume form on Q. We first denote 
A(fi) = {A G C°°(fi, M) : A > 0} which is a group under the pointwise multi¬ 
plication and recall that Dens(fl) is the set of finite Radon measures that have 
smooth positive density w.r.t. the reference measure v. We first define a group 
morphism from Diff(P) into the automorphism group of A(f2), : Diff(fl) —>• 

Aut(A(P)) by '!'(</?) : A i->- <^ _1 • A where p ■ A == A o p~ l is the usual left ac¬ 
tion of the group of diffeomorphisms on the space of functions. The map T is 
an antihomomorphism since it reverses the order of the action. The associated 
semi-direct product is well-defined and it will be denoted by Diff(ff) ix^ A(P). 
We recall the following properties for f\,f 2 € Diff(P) and Ai, A 2 € A(fi), 

(<£ 1 , Ai) • {f 2 , A 2 ) = (fi o f2 , {pf 1 • Ai)A 2 ) (2.13) 

(<Pi, Ai) -1 = • Ar 1 ) • (2-14) 

Note that this is not the usual definition of a semi-direct product of groups but 
it is isomorphic to it. We chose this definition in order to get the following 
left-action: 

Proposition 2.6 (Left action). The map 7 r defined by 

it : (Diff(P) K vjf A(P)) X Dens(fl) 1 —> Dens(P) 
n{(p, A),p) =■ if • A) p*p = f*(\p) 
is a left-action of the group Diff(ff) Kf A(f2) on the space of densities. 

Proof. This can be checked by the following elementary calculation: 

tt((Pi, Ai) • (f 2 , A 2 ),p) = vr ((v?i o f 2 , (pf 1 • Ai)A 2 ),p) 

= (vi o p 2 ) ■ {(p 2 1 • Ai)A 2 )(</?i of 2 )-p 
= (fi ■ Ai )(f2 ■ A 2 )(<£i °V2)■ P 

= 7r((^i,Ai),7r((v7 2 ,A 2 ),/9)) . 

The identity element in Diff(P) A(P) is (Id, 1) and one trivially has: 

7T ((Id, 1 ),p) = (Id -1) Id* p = p. 


□ 
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2.4 Generalization of Otto’s Riemannian Submersion 

In this section, we define useful notions to obtain the generalization of Otto’s 
result. The next definition is a simple change of variables on the tangent space 
of the group. It represents the change between Lagrangian and Eulerian point of 
view. 

Definition 2.4 (Right-trivialization). Let H be a group and a smooth manifold 
at the same time, possibly of infinite dimensions, the right-reduction of TH is the 
bundle isomorphism r :TH i—>• H x Tp H defined by r{h,X /J '=' (h,T7p- iXff), 
where Xh is a tangent vector at point h and IZ^-i : H —» H is the right multi¬ 
plication by h , namely, IZ h -i(f) = fh~ x for all / € H. 

In the finite dimensional case, we would have chosen to work with H a Lie 
group, however, in infinite dimensions, being a Lie group is too restrictive as 
shown by Omori [Omo78]. For instance, in fluid dynamics, the right-trivialized 
tangent vector X^-h^ 1 is the spatial or Eulerian velocity (the vector field) and X^ 
is the Lagrangian velocity. Note that most of the time, this right-trivialization 
map is continuous but not differentiable due to a loss of smoothness of the right 
composition (see [ ]). 

Example 2.1. For the semi-direct product of groups defined above, we have 

r(p, A), (X v ,X x )) = {{ip, A), (X v PGA” 1 ))), (2.15) 

or equivalently, 

r(P, A), {X v ,X x )) = (P, A), {X v o ip-', PGA" 1 ) o ip- 1 )). (2.16) 

We will denote by (v, a) an element of the tangent space of Tq^i) Diff(fi) x^A(fl). 
Any path on the group can be parametrized by its initial point and its right- 
trivialized tangent vector. The reconstruction equation reads 

(d t tp(t,x) = v(t,ip(t,x)) 

|<9 t A(t, x) = a(t, ip(t, x))X(t, x ) 

for given initial conditions y?(0,x) and A(0, x). Note that this system recovers 
equation (2.3). 

We state without proof a result that will be needed in the Kantorovich formu¬ 
lation and which is a straightforward consequence of a Cauchy-Lipschitz result, 
whose proof can be found in [ R97]. 

Proposition 2.7. If v € L 1 ([0, T], W 1,00 (f2)), then the first equation in (2.17) 
has a unique solution in lK 1,00 (fl). If, in addition, a € L°°(Q), then the sys¬ 
tem (2.17) has a unique solution. 

We also need the notion of infinitesimal action associated with a group action. 
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Definition 2.5 (Infinitesimal action). For a smooth left action of H on a man¬ 
ifold M, the infinitesimal action is the map T\^H X M H > TMdefined by 

t ■ q =' t: exp(^t) • q G T q M (2.18) 

at t =o 

where exp(£i) is the solution to h = £ ■ h and h( 0) = Id. 

Example 2.2. For Diff(fl) Kvp A(fi) the application of the definition gives (v, a) ■ 
p = —V • (up) + ap. Indeed, one has 

A(i)) • P = Jac((p(i) _1 )(A(t)p) o <p _1 (t). 

First recall that <9j<p(i) = v o tp(t) and dtX = a\(t). Once evaluated at time 
t = 0 where 75 ( 0 ) = Id and A(0) = 1, the differentiation with respect to ip gives 
—V • (up) and the second term ap is given by the differentiation with respect to 
A. 


We now recall a standard construction to obtain Riemannian submersions 
from a transitive group action in the situation where the isotropy subgroups are 
conjugate to each others. The next proposition is a reformulation of [\Iic08, 
Claim of Section 29.21] which is concerned with the finite dimensional case. We 
formally apply the result in our context which is infinite dimensional. 

Proposition 2.8. Suppose that a smooth left action of Lie group H on a manifold 
M is transitive and such that for every p G M, the infinitesimal action f^f-p 
is a surjective map. Let po € M and a Riem.annian m.etric G on H that can be 
written as: 

G(h)(X h ,X h )=g(h-p 0 )(X h -h- 1 ,X h -h- 1 ) (2.19) 

for g{h ■ po) an inner product on T^H. Let X p G T p M be a tangent vector at 
point h ■ po = p G M, we define the Riemannian metric ~g on M by 

g(p)(X p ,X p ) =' min <?(p)(£,£) under the constraint X p = £ • p. (2.20) 
£eTj d // 

where £ = X^ ■ h 

Then, the map ttq : H 1 —> M defined by 7To (h) = h ■ po is a Riemannian 
submersion of the metric G on H to the metric ~g on M. 

Note that, by hypothesis, the infinitesimal action is supposed to be surjec¬ 
tive, therefore the optimization set is not empty and it needs to be checked that 
the infimum is attained (in infinite dimensions). This will be done in Proposi¬ 
tion 2.12. 

Note also that the submersion can be rewritten as the quotient map from H 
into the space of right-cosets it : H —» Hq\H where Hq is the isotropy subgroup 
of po in H. Therefore, other fibers of the submersion are right-cosets of the 
subgroup H 0 in H. 

We now apply this construction to the action of the semi-direct product of 
group onto the space of densities in order to retrieve the class of WF metrics: 
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We choose a reference smooth density po and for a general admissible metric 
we define the Riemannian metric we will use on Diff(O) k<j, A(Q) by, denoting 
(p ■ A(/?*po by p and using the same notation for the infinitesimal action in (2.2), 


G(ip, A) ((Xp, X x ), (X v ,X x )) = - g(x)((v(x), a(x)), (v(x), a(x)))p(x) dx 


9((X<p op \(X x \ l )o ip l ),(X v oy fi(X x \ l )o(p l ))pdx (2.21) 


where (X v ,X x ) E TL,^) Diff(f2) Kvj> A(f2) is a tangent vector at (</?, A). Recall 
that g(x) is an inner product on T X Q x M that depends smoothly on x as defined 
in (2.9). The initial WF model reads 


G{p, A) ((X v ,X x ), (X v ,X x )) = \ f \v(x)\ 2 p(x)dx + ^~ [ a(x) 2 p(x)d 

z Jn z Jn 

5 2 f 2 

• tp~ l \ 2 y ■ \Lp*po(x)dx + — / (ip ■ (XaA^ 1 ))^ ip ■ \ip if p Q {x)d'. 

2 Jn 


\X<p o i 


x. 

( 2 . 22 ) 


At a formal level, we thus get, for po a measure of finite mass and which has 
a smooth density w.r.t. the reference measure v and for the general metric G 
defined above: 


Proposition 2.9 (Riemannian Submersion). Let po E Dens(fl) and no : Diff(fi)x,j, 
A(fl) i— y Dens(fl) be the map defined by no(<p, A) p*(Apo)- 

Then, the map no is formally a Riemannian submersion of the metric G on 
the group Diff(fl) x,j, A(f l) to the metric WF on the space of densities Dens(fi). 

This proposition is formal in the sense that we do not know if the metrics G 
and WF and the map no are smooth or not for some well chosen topologies and 
if the horizontal lift is well defined. We address the smoothness of G in the next 
section. 


2.5 Curvature of Diff(f2) A(f2) 

In this section, we are interested in curvature properties of the space Diff (0) 
A(Q). Since we want to work in a smooth setting, we will work in a stronger 
topology on the group than the one defined by the metric G. Therefore, we will 
use the definition of a weak Riemannian metric [EM70, Section 9] that we recall 
below. 

Definition 2.6 (Weak metric). Let X be a Hilbert manifold modeled on a Hilbert 
space H. A weak Riemannian metric g on A is a smooth map x E X > g(x) 
into the space of positive definite bilinear forms on T x X. 

Note that the inner product on T x need not define the topology on T x X since 
it can be weaker than the scalar product on H. In order to give a rigorous meaning 
to the next lemma, we will work on the group of Sobolev diffeomorphisms Diff s (fl) 
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for s > d/2 + 1 and A s (0) = {/ € H s (Ll) : / > 0}. We refer to [ /13] for a 
more detailed presentation of Diff s (O) and we only recall that it is contained in 
the group of C 1 diffeomorphisms of LI. We first prove a lemma that shows that 
the metric G is a weak Riemannian metric on Diff s (0) x<j, A s (0). 

Lemma 2.10. On Diff s (0) x^ A s (0), one has 
G(<p, X)((X v ,X x ),{X v ,X x )) = 

\ J^ 9(<p(x), H x ))(( x <p(x),X\(x)), (X ip (x),X x (x)))p 0 (x) du(x) , (2.23) 

which is a weak Riemannian metric. 

Rem.ark. Since G is only a weak Riemannian metric, the Levi-Civita connection 
does not necessarily exists as explained in [ M70] or in [ |. 

Proof. In the definition of the metric G, we make the change of variables by 
which is allowed since ip E Diff s (II) and the definition of an admissible metric to 
obtain formula (2.23). Since Ll is compact, A attains its strictly positive lower 
bound. In addition, using the fact that g is a smooth function and H s (Ll) is a 
Hilbert algebra, the metric is also smooth. □ 

Note that the formulation (2.23) shows that this metric is an L 2 metric on 
the space of functions from Ll into LI x R+ endowed with the Riemanian metric g. 
Then, the group Diff s (H) x<j, A S (H) is an open subset of H s (Ll,Ll x R?j_). These 
functional spaces have been studied in [ | as manifolds of mappings and 

they prove, in particular, the existence of a Levi-Civita connection for Diff s (H) 
endowed with an L 2 metric. 

Theorem 2.11 (Sectional curvature of the group). Let Ll x Rt_ endowed with 
an admissible Riemannian metric g and p be a density on Q. Let X. Y be two 
smooth vector fields on Diff s (I7) A s (fl) which are orthogonal for the L 2 (Ll,p) 
scalar product on H s (Ll, Ll x R*j_). Denoting IC p the curvature tensor of G at point 
p = (</?, A), one has 

X p (X p ,Y p ) = 

[ K p{x) (X p (x),Y p (x))(\X p (x)\ 2 \\Y p (x)\ 2 - (X p (x),Y p (x))) P (x)dv(x) (2.24) 
Jn 

where X p '=' X(p) and (-,-) and \ ■ \ stands for the metric g. In addition, K y 
denotes the sectional curvature of (H x R!j_,g) at. the point i/GSlx R^j_. 

Proof. Since H is compact, the appendix in [ 189] can be applied and it gives 

the result. □ 

Rem.ark. It can be useful for the understanding of Formula (2.24) to recall some 
facts that can be found in [ ] or [Mis! | and [ ]. Let us denote M d = H 

and N = O x RA. The first step of the proof of 2.11 is the existence of the Levi- 
Civita connection which is a direct adaptation of [ \ 170, Section 9]. Denoting 
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7ri : TTN —>• TN be the canonical projection. As recalled in [ M70, Section 2], 
one has, for M, N smooth compact manifold with smooth boundaries, 

T f H s (M,N) = {geH s (M,TN) : ttj o g = /} , 


and 


TTH S (M, N ) = {y g if a (M, TTN) : tt, o y e TH S (M , 7W)} . 

Denote by /C the connector associated with the Levi-Civita connection of g, a 
careful detailed presentation of connectors can be found in [ es 78 ]. The Levi- 
Civita connection V on H S (M, N) endowed with the L 2 metric with respect to 
the metric g on N and the volume form go on M is given by: 

V Y y = KoTYoX. (2.25) 

The result on the curvature tensor can be deduced from these facts. 

2.6 Curvature of Dens(f2) 

This section is concerned with the formal computation of the curvature of Dens(O). 
The WF metric can be proven to be a weak Riemannian metric on the space of 
densities of H s regularity. However, the Levi-Civita does not exist. These two 
facts are proven in Appendix A. Moreover, in this context, the submersion de¬ 
fined in Section 2.4 is not smooth due to a loss of regularity. The rest of the 
section will thus consist in formal computations. 

In order to apply O’Neill’s formula, we need to compute the horizontal lift 
of a vector field on Dens(fl). In this case of a left action, there is a natural 
extension of the horizontal lift of a tangent vector at point p G Dens(H). Recall 
that the horizontal lift is defined by formula (2.20). The following proposition is 
straightforward: 

Proposition 2.12 (Horizontal lift). Let p € Dens(fl) be a smooth density and 
X p G C'°°(H,R) be a smooth function that represents a tangent vector at the 
density p. The horizontal lift at (Id, 1) of X p is given by (V4>,<I>) where is the 
solution to the elliptic partial differential equation: 

- V • (pV$) + ®p = X p , (2.26) 

with homogeneous Neumann boundary conditions. 

Proof. Using the formula (2.20), the horizontal lift of the tangent vector X p is 
given by the minimization of the norm of a tangent vector (y, a) at (Id, 1) 

inf ^ / g((v,a), (v,a))pdv(x) , (2.27) 

v > a 2 Jn 

under the constraint —V • (pv) +ap = X p . This is a standard projection problem 
for the space L 2 (H,M a! ) X L 2 (H,M) endowed with the scalar product defined 
in (2.27) (recall that p is positive on a compact manifold). The existence of a 
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minimizer is thus guaranteed and there exists a Lagrange multiplier <3? E F 2 (0,]R) 
such that the minimizer will be of the form (V<3?,<i>). Therefore, the solution 
to the elliptic partial differential equation (2.26) is the solution. By elliptic 
regularity theory, the solution T is smooth. □ 

In order to compute the curvature, we only need to evaluate it on any hori¬ 
zontal lift that projects to X p at point p. There is a natural lift in this situation 
given by the right-invariant vector field on Diff s (fi) tx^ A s (fl). 

Definition 2.7. Let (v,a) E TL d l \ Diff s (fi) tx^ A s (fi) be a tangent vector. The 
associated right-invariant vector field £(^,a) is given by 

€(?,<*)&, X) =' {{<P,X),(v°<p,ai°<pX)) • (2.28) 

Rem.ark. Note that, due to the loss of smoothness of the right composition, this 
vector field £r Vt0l ) is smooth for the H s topology if and only if (v,a) is C°°. 

Last, we need the Lie bracket of the horizontal vector fields on the group. In 
the case of right-invariant vector fields on the group, their Lie bracket is the right- 
invariant vector field associated with the Lie bracket on the manifold. Therefore, 
we have: 


Proposition 2.13. Let (ui,ai) and (v 2 ,a 2 ) be two tangent vectors at identity. 
Then, 

[(ui, ai), (v 2 , a 2 )] = ([ui, u 2 ], V«i • v 2 - Va 2 • vi) , (2.29) 

where [^ 1 ,^ 2 ] denotes the Lie bracket on vector fields, and therefore, 


[£(i;i,a!i)> £(ii2,“2)] ^(K,i'2]iVai'i'2-Va2-n) ' 


(2.30) 


Thus, applying this formula to horizontal vector fields gives 


Corollary 2.14. Let p be a smooth density and X\,X 2 be two tangent vectors 
at p, 4 > i, < 1 ) 2 be the corresponding solutions of (2.26). We then have 


[£(V$1,$i)j£(V$2,$2)] - £([V$1,V$2],0) • ( 2 - 31 ) 

We formally apply O’Neill’s formula to obtain a similar result to standard 
optimal transport. This formal computation could be probably made more rig¬ 
orous following [Lot08] in a smooth context or following [ j. It is however 
not possible to apply the O’Neill formula developed in [ ] in an infinite 

dimensional setting due to the lack of regularity of the submersion and the non¬ 
existence of the Levi-Civita connection as shown in Appendix A.2. 

Proposition 2.15 (Sectional curvature of ILF). Let p be a smooth density and 
X 1 , X 2 be two orthonormal tangent vectors at p and 


Zi = (V$r, <&r) = £i((Id, 1 )), Z 2 = (Vd> 2 , $ 2 ) = 62 ((Id, 1)) 
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their corresponding right-invariant horizontal lifts on the group. If O’Neill’s for¬ 
mula can be applied, the sectional curvature of Dens(fl) at point p is given by 


K{p)(Xi,X 2 )= [ k(x,l)(Z 1 (x),Z 2 {x))w(Z 1 (x),Z 2 (x))p(x)dv(x) 
•hi 


+ h\[Z u Z 2 ] v \\ 2 ( 2 . 32 ) 


where 

w(Z l (x),Z 2 (x )) = g(x)(Z 1 (x), Z 1 (x))g{x)(Z 2 (x), Z 2 (x)) - g(x)(Z 1 (x), Z 2 (x)) 2 

and \Z\, Z 2 ] v denotes the vertical projection of[Z\, Z 2 \ at identity and ||-|| denotes 
the norm at identity. 

Proof. This is the application of O’Neill’s formula [ mn99, Corollary 6.2] and 
Proposition 2.9 at the reference density p together with Theorem 2.11 on the 
vector fields £i,£ 2 on the group. □ 

Rem.ark. It is important to insist on the fact that we only compute the "local" 
sectional curvature. We have seen that the geometry of space and mass is that 
of a Riemannian cone, in which the curvature concentrates at the apex, although 
the Riemannian cone can be locally flat. Obviously, in this infinite dimensional 
context, the sectional curvature only gives information in smooth neighborhoods 
of the density. 

Corollary 2.16. Let (Ll,g) be a compact Riem.annian manifold of sectional cur¬ 
vature bounded below by 1, then the sectional curvature of (Dens(fl), WF) is non¬ 
negative. 

We stated this Corollary due to the interest in displacement convexity of 
the Boltzmann entropy (see [Vil09, Corollary 17.19]). In the case where Ll is 
the (Euclidean) sphere, the Riemannian cone with the standard metric Cone(ll) 
is flat and a finer characterisation of null sectional curvature for WF can be 
given. Namely, the sectional curvature vanishes if and only if V^>i and X<f 2 are 
commuting vector fields on the sphere. 

2.7 The Corresponding Monge Formulation 

In this section, we discuss a formal Monge formulation that will motivate the 
development of the corresponding Kantorovich formulation in Section 3. 

Let us recall an important property of a Riemannian submersion 

vr : ( M,g M ) i-t (B,g B ) ■ 

Every horizontal lift of a geodesic on the base space B is a geodesic in M. In 
turn, given any two points (p, q ) E B, any length minimizing geodesic between 
the fibers vr _1 (p) and n~ 1 (q) projects down onto a length minimizing geodesic 
on B between p and q. From the point of view of applications, it can be either 
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interesting to compute the geodesic downstairs and then lift it up horizontally or 
going the other way. 

In the context of this generalized optimal transport model, the Rieman- 
nian submersion property is shown in Proposition 2.9. Moreover, the metric 
on Diff(fl) K vji A(D) is an L 2 metric as proven in the lemma 2.10, which is par¬ 
ticularly simple. Therefore, it is possible to formulate the corresponding Monge 
problem 

WF( Po , Pl ) = inf {Ufa, A) - (Id,l)|| L2(po) : ip*(\p 0 ) = Pi} • (2.33) 

Let us denote by d the distance on D X associated with an admissible Rie- 
mannian metric, then we have: 

||(</?,A) - (Id, l)|li, 2 (po} = j d((<p(x), A),(x, l)) 2 po(x)dv(x). (2.34) 

In the case of a standard Riemannian cone with the metric mg + ^-d m 2 , Propo¬ 
sition 2.3 gives the explicit expression of the distance which gives 

||(^,A)-(Id, l)||| 2(po) = 1 + A —2\/Acos (d{p(x),x) A it) p 0 (x) dv(x). (2.35) 

From a variational calculus point of view, it is customary to pass from the 
Monge formulation to its relaxation. So, instead of making rigorous statements 
on this Monge formulation, we will directly work on the Kantorovich formulation 
in the next sections. However, in the next sections, we do not restrict our study to 
Riemannian costs and we extend it to general dynamical costs that are introduced 
in Definition 4.2. As a motivation for this generalization we can mention the case 
of L p norms for p > 1, namely: 

G(ip, A) (Xp, X\) = - [ \v(x)\ p p(x)dv(x) + - f \a(x)\ p p(x)dv(x ). (2.36) 

PJn P Jn 

In Lagrangian coordinates, this gives rise to an L p metric on D X Ml, namely if 
y = m 1//p , then g(x,y)(v x ,v y ) = (y||'Lz||) p + \v y \ p - Note that the distance induced 
by the L p norm does not correspond to the standard L p norm on the Euclidean 
cone. However, it is possible to retrieve the standard L p norm on the cone in the 
general setting of Section 4, by pulling it back in spherical coordinates. 

3 Static Kantorovich Formulations 

Inspired by the Monge formulation of Section 2.7 which emphasizes the impor¬ 
tance of the distance on the space H x M*_, we now propose a general Kantorovich 
problem in Definition 3.3 and its dual formulation in Theorem 3.5. It includes 
as a particular case the relaxation of the Monge formulation. This Kantorovich 
formulation does not need the cost on the product space to be related to a dis¬ 
tance, yet, when it is the case and under mild assumptions described below, the 
Kantorovich formulation defines a distance on A4+(X) as shown in Theorem 3.2. 
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3.1 Definitions 

In what follows, Cl is a compact set in R d , x typically refers to a point in and 
m to a mass. We first require some properties on the cost function. 

Definition 3.1 (Cost function). In the sequel, a cost function is a function 

(Cl x [0, +oo[) 2 — > [0,+oo] 

(x 0 ,m 0 ), (xi,mi) c(xo,mo, xi,rrii) 

which is l.s.c. in all its arguments and jointly sublinear in (mo, mi). 

A sublinear function is by definition a positively 1-homogeneous and subad¬ 
ditive function, or equivalently, a positively 1 -homogeneous and convex function. 
The joint sub-additivity of c in (mo,mi) guarantees that it is always better to 
send mass from one point to another in one single chunk. In order to allow 
for variations of mass, we need to adapt the constraint set of standard optimal 
transport by introducing two “semi-couplings”, which are relaxed couplings with 
only one marginal being fixed. 

Definition 3.2 (Semi-couplings). For two marginals po, Pi € A1+(H), the set of 
semi-couplings is 

r(Po,Pi) =' {(7o,7i) € (M+(0 2 )) 2 : (Proj 0 ) # 7 o = po, (Proj 1 ) # 7 i = pi} . 

(3.1) 

Informally, 70 (x,y) represents the amount of mass that is taken from po 
at point x and is then transported to an (possibly different, to account for cre¬ 
ation/destruction) amount of mass 71 (x, y ) at point y of p\. These semi-couplings 
allow us to formulate a novel static Kantorovich formulation of unbalanced opti¬ 
mal transport as follows. 

Definition 3.3 (Unbalanced Kantorovich Problem). For a cost function c we 
introduce the functional 

Jk( 70 , 71 ) =' J^ 2 c ( 2 , ^,y, y) y ), (3.2) 

where 7 E Ad+(U 2 ) is any measure such that 70,71 7 . This functional is well- 

defined since c is jointly 1-homogeneous w.r.t. the mass variables (see Definition 
3.1). The corresponding optimization problem is 

C K (po,Pi) =' inf J K ( 7 o, 7 i)- (3-3) 

(7o,7i)er(p 0 ,pi) 

Proposition 3.1. If c is a cost function then a minimizer for Ck(po, Pi) exists. 

Proof. Let Cl be an open set such that Cl C Cl and extend c as +00 if either x or y 
belongs to Cl\Cl. Then by virtue of [AFP00, Theorem 2.38], Jk is weakly* l.s.c. 
on A4(Cl 2 ) and a fortiori on _M(fl 2 ). Since D is compact and the marginals po, Pi 
have finite mass, T(po, Pi) is tight and thus weakly* pre-compact. It is also closed 
so r(po,pi) is weakly* compact and any minimizing sequence admits a cluster 
point which is a minimizer (the minimum is not assumed to be finite). □ 
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Example 3.1. Standard optimal transport problems with a nonnegative, l.s.c. 
cost c are retrieved as particular cases, by taking 

j m 0 ■ c(x, y ) if m 0 = m\ , 
c(x 0 ,m 0 ,x = < 

I +oo otherwise. 

Other examples, in particular the WF-metric, are studied in more detail in 
Section 5. 

3.2 Properties 

A central property of optimal transport is that it can be used to lift a metric 
from the base space P to the space of probability measures over P (cf. [Vil09, 
Chapter 6 ]). We now show that this extends to our framework. 

Theorem 3.2 (Metric). Let c be a cost function such that., for som.e p G [1, +oo[ 
(xq, mo), (xi,mi) c(.To,m. 0 ,xi,mi) 1/p (3.4) 

is a (pseudo-)metric on Cone(P). Then Ck {•, -) l ^ p defines a (pseudo-)metric on 

M+(n). 

Rem.ark. The space Cone(P) is the space P x [0,+oo[, where all points with 
zero mass P X {0} are identified as defined in Definition 2.2. Note also that the 
metric CV(', •) 1//p may take the value +oo if c does so and the denomination 
“pseudo-metric” is sometimes preferred for such objects. 

Proof. First, symmetry and non-negativity are inherited from c. Moreover, 

[Ck (7o,7i) = 0] <t7 

[ 3 ( 70 , 71 ) G r(p 0 ,Pi) : (70 = 7i) and (x = y 70 -a.e.)] 

^ [po = Pi}- 

It remains to show the triangle inequality. Fix po, p±, P 2 G A4 + (P). Take 
two pairs of minimizers for (3.3) 

(7o 1 >7? 1 ) € r(po,pi), (7 o 2 ,7i 12 ) € T(p!,p 2 ), 

and let p G A4 + (P) be such that (Proj^#^ 1 , y^ 1 ) <C p and (Proj 0 )#( 7 Q 2 , 7 J 2 ) <C 
p. Denote by ( 7 f 1 |p)(x|y) the disintegration of 7? 1 along the second factor w.r.t. 
p. More precisely, for all y G P, ( 7 ? 1 |p)(-|y) € A4+(P) and it holds, for all / 
measurable on P 2 , 

[ / d 7 ? 1= [ ( [ f(x,y)d(^ 1 \p)(x\y)] d p(y) 

Jn 2 Jn \Jn J 

and analogously for ('y) 2 \p)(z\y) along the first factor for i = 0,1. Write — (y) for 
the density of p\ w.r.t. p. We combine the optimal semi-couplings in a suitable 
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way to define 70 , 71 , 7 € Ai(Q 3 ) (via disintegration w.r.t. p along the second 
factor): 




1 

f (7oV)0d2/)®(7n 2 |M)(2|2/) 

if f(y) > 0 , 

otherwise, 

(70 1 p)(x, 

Av) 

def. 1 

1 

l( 7 o 1 l/ i )(*b)® < 5 i/W 



I 

f (7? 1 \i i ){. x \y)®(i\ 2 \i i )( z \y) 

if f(y) > 0 , 

otherwise, 

(71 1 p){x, 

Av) 

def. 1 

1 fw~ 

Uy( x ) <8> inl 2 \p)(z\y) 



I 

f (7? 1 \t i )( x \y)®(in 2 \i j ’)( z \y) 

if f(y) > 0 , 

otherwise. 

(h\p){ x i 

Ay) 

def. I 

1 21 (y) 

p, w 

1° 


The interpretation of 70 is that all mass that leaves x towards y, according to 
7 q 1 (x,y), is distributed over the third factor according to 7 q 2 (y,z). In case the 
mass disappears at y, it is simply “dropped” as 5 y on the third factor. Then 71 is 
built analogously for the incoming masses and 7 is a combination of incoming and 
outgoing masses. For i = 0,1 let 7? 2 = (Proj 02 )# 7 i and note that, by construc¬ 
tion, ( 7 o 2 , 7 i 2 ) € r(po,p 2 ). In the rest of the proof, for an improved readability, 
when writing the functional the “dummy” measure 7 such that 70,71 7 is 

considered as implicit and we write 

J^c(x,'y 0 (x,y),y,n/i{x,y)) dxdy for J c (x, y, dj(x, y). 

With this notation, one has 



z),y,'l(x,y,z))dx dydz 


= [ ([ c(x, (7gV)(x|^),?/, (7?V)(x[ y)) ^° j^y^ dadz) d p(y) 

J P (y )>0 \Jn 2 -jiKV) ) 

+ [ ([ c(x, ('yfl 1 \p)(x\y),y, 0 ) 6 y(z)dxdz ) dp(y) 

Jp(y)=0 \Jn 2 / 

= f ([ c(x, (7o 1 l/ i )(*|y).y.(7iV)(*|y))dx > ) dn(y) 

Jn \Jn / 

= ■7k'(7o 1 >Ti 1 ) = C K (po,Pi), 

and analogously 


/ c(y,j(x,y,z), 

Jn 3 


z,^i(x,y,z))dx dydz 


= Ck(pi,P2 ) • 
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One finally obtains 


1 

Ck(po,P2) p < 

( / c(x,Jo 2 (x,z),z, 

v Jn 2 

7? 2 (x,z))dxdz) 

1 

P 

(i) 

< 

( / c(x,jo(x,y,z),z 

K Jn 3 

, 7 l {%,y, 

z)) dx dydz'j P 

(2) 

< 

(/., 

c(x,j 0 (x,y,z), 

y,i(xpy 

z)) p + 




c(y,l(x,y,z 


1 

y,z)) p 

dx dy d z'j P 

(3 

< 

( / c(x,Uo(x,y,z),i 
Wo a 

i,i (x,y,~ 

z))dx dydz^j P + 



<C^ 

u 

CO 

-G 

x,y,z),z 

,7i ( x ,y 

,z))dx dydz^J 


(4) 1 I 

= C K Oo,Pl) p + C K {pi,P2) p 


where we used (1) the convexity of c, (2) the fact that c 1//p satisfies the trian¬ 
gle inequality, (3) Minkowski’s inequality and (4) conies from the computations 
above. Thus Ck {-) 1//p satisfies the triangle inequality, and is a metric. □ 

Next, we establish under reasonable assumptions on the cost c that the 
corresponding unbalanced transport cost Ck is weakly* continuous w.r.t. the 
marginals. This is important for numerical applications, since it implies robust¬ 
ness when one approximates the original measures with discrete sums of Diracs. 
It is also crucial to study the well posedness of variational problems involving 
these metrics, such as for instance the definition of barycenters (see [ ] for 

the classical optimal transport case). Moreover it will be a useful tool in our 
further analysis (e.g. Theorem 4.3 and examples in Sect. 5). 

Theorem 3.3. Assume c satisfies the assumptions of Theorem 3.2 and: 

(Al) Continuity of the spatial metric: (x,y) i->- c(x, 1 ,y, 1) is continuous. 

(A2) Finite cost of mass disappearance: there exists x € D such that c(x, 1, x , 0) < 
oo. 

Then Ck is continuous for the weak* topology on A4_|_(ff) 2 . 

Lemma 3.4. Under the assumptions of Theorem 3.3, for some p € [1, +oo[, 
(Bl) (xo,xi) i —y c(x o, l,aq, l) 1//p is equivalent to the Euclidean metric; 

(B2) x i —>■ c(x, 1, x , 0) is bounded on 0; 

(B3) the family of functions {m H > c(x, 1, x, m)} X £n is equicontinuous at m = 1. 
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Proof. (Bl): Two metrics d\, efo are equivalent when 

di(x n , x) —>• 0 77 d 2 (x n , x) —>• 0 . 

Let d\(x,y) = \x — y\ the Euclidean metric on 17 and d^ix, y) = c(x, 1 ,y, 1) 1//p . 
Assumption (Al) implies “=7”. Now we want to show that d 2 (x n ,x) —>• 0 implies 
di(x n ,x ) —>• 0. Let £ > 0 and let 5 = 17 n B(x,e) (B(x,s) denoting the open 
ball of radius £ around x). Then 17 \ S is compact. Therefore x' H > d^ix^x') 
attains its minimum 5 over 11 \ S on that set and by the metric property 5 > 0. 
Since d 2 (x n ,x) —» 0 there is some N E N such that d 2 (x n , x) < 5 for n> N and 
consequently x n € B(x,e) for n > N. 

(B2) For any y E 17 one has by the triangle inequality and the identification of 
zero-mass points c(y, 1, y, 0) 1/,p < c(y, 1, x, l) 1/,p + c(x, l,x, 0 ) 1//p . The first term 
is bounded since it is continuous in y and 17 is compact. The second term is 
bounded by assumption (A2). 

(B3) Let C be the uniform bound of c(x, l,x,0) over 17. By 1-homogeneity 
and triangle inequality it holds c(x, 1 , x, 2 ) < (c(x, 1 , x, O) 1 ^ +c(x, 2 , x, o y/py < 
C (1 + 2 1 / p ) p d = C. By convexity of c, c(x, rn, x, 1 ) < |m — 1 | C' for m € [ 0 , 2 ], 
As c is nonnegative, the family {m i— c(x,l, x,m)} x ^Q is therefore uniformly 
continuous at m = 1 . □ 

Proof of Theorem 3.3. Throughout this proof we are going to use the properties 
(Bl) to (B3) established in Lemma 3.4. Thanks to the triangle inequality, we 
only need to check that, if p n —*■* p then Ck(Pv,iP) —> 0 . 

Recall that weak* convergence implies (1) convergence of the total masses 
and (2) convergence for the p-Wasserstein distance W p (p E [l,+oo[) of the 
rescaled measures (see for instance [Vil03, Theorem 7.12]). We denote by W P)C 
the Wasserstein metric induced by the ground metric (x,y) >—?• c(x, 1, y, l) 1 /^. 
By property (Bl) this ground metric induces the same topology on 17 as the 
Euclidean metric. Therefore both metrics induce the same space of continuous 
functions (7(12) and therefore convergence of W p and W P)C is equivalent, too. 

If p = 0 then, by taking (q n = (id, id)#p n , 0) E T(p n ,p) we see that, by 
property (B2), 

CR(Pn,p)< / c(x Q , l,xi, 0 )d 7 n (xo,xi) < (sup |c(x, l,x, 0 )|) • p n (17) -4- 0 . 

Otherwise, let us assume that for all n € N, p n (Q) > 0 as it is eventually true. 
Introduce Af(p n ) = (p(17)/p n (17)) • p n . It holds, by the triangle inequality, 

cUpn,p) < cl(p n ,M{Pn)) + cl{M{p n ),p) . 

On the one hand, by choosing 70 = (id, id)#p n and 71 = (p(17)/p n (17)) ■ 70 we 
have, by property (B3), 

CK{pn,N(p n )) < Jk{ 70,7i) = J c(x, 1,X, °' 
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On the other hand, by choosing 70 = 71 equal to the optimal transport plan for 
the W P}C metric between p and J\f(p n ), we have by property (Bl), 

C K (J\f(Pn),p) < [ c(x, l,y, l)d 7 o ->• 0 

and we conclude thus that Cj<c(p n . p) —>• 0. □ 

Let us now give a dual formulation of the problem. Since the cost function 
c is jointly 1 -homogeneous, convex, and l.s.c. in the variables (mo,mi), for all 
(x,y) E fi 2 , the Legendre transform of c(x,-,y,-) is the indicator of a closed 
convex set. Morevover, as c is non-negative, and +00 if mo or m\ is negative, 
the latter contains the negative orthant. This set is described by the set valued 
function Q : Q 2 — > 2 RxR . For nonempty convex valued multifunctions, Q is said 
to be lower semicontinuous if G = {(x,t) : x E int Q(t)} is open (see [Roc71, 
Lemma 2]). 

Theorem 3.5 (Duality). Let c be a cost function, define for all (x,y) E fl 2 the 
nonempty, closed and convex set Q(x, y) = {(a, b) E M 2 : c(x, -,y, -)*(a, b ) < + 00 } 
and let 


B = {( 0 ,- 0 ) € C(n) 2 :V(x,y) € D 2 , (fi(x),fi>(y)) € Q(x,y)} . 
If Q is lower semicontinuous then 

sup / 4>(x)dpo + / if(y)dpi= min Ja-( 70,71 )• 
Jn (7o,7i)er(p 0 ,pi) 

Proof. Let us rewrite the supremum problem as 

sup -F(u 0 , ui) - G(u 0 ,ui) 
(«o,«i)6(C(n 2 ) 2 


where 


G : (u 0 ,ui) < 


fn<t>( x ) d Po - f n 'if(y)d Pl 


^+00 


if u 0 (x,y) = fi(x) 
and ui(x,y) = fi(y) 
otherwise, 


and F is the indicator of {(uq, ufi) € (C(D 2 ) 2 ) : (uq,ui)(x, y) E Q(x, y), V(x, y) € 
D 2 }. Note that F and G are convex and proper. Also, given our assumptions, 
there is a pair of functions (uo,ui) at which F is continuous (for the sup norm 
topology) and F and G are finite since for all (x, y) E D 2 , Q(x,y ) contains the 
negative orthant. Then Fenchel-Rockafellar duality theorem (see, e.g. [Vil03, 
Theorem 1.9]) states that 


sup -F(u 0 , ui) - G(u 0 , i*i) 

(«o,«i)s(C(Q 2 ) 2 


= min (T*(70,71) + G*(-7o, -71)} • 
(70,71 )6A1(0 2 ) 2 

(3.5) 
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Let us compute the Legendre transforms. For G, we obtain 

<^*(-70,-71)= sup - / </>(x)d7o — / d7i + / <j>(x)d p 0 + / ^(y)dpi 

(<p,ip)ec(n) 2 Jn 2 Jn 2 Jn Jn 

^+/~ 

PO,Pl 


0 if (70,71) e 


+00 otherwise. 


where r^ pi is the set of semi-couplings without the non-negativity constraint. 
On the other hand, F* is given by [Roc71, Theorem 6] which states that 


^*(7o,7i) = f^c(x,^,y,^ dry(x,y) 

where 7 is any measure in _A/f_|_(P 2 ) with respect to which (70,71) is absolutely 
continuous. Finally, as F* includes the nonnegativity constraint, the right hand 
side of (3.5) is equal to min (70)7l)er(p0iPl) (70,71)- □ 


4 From Dynamic Models to Static Problems 

In this section we establish that a certain class of convex, positively homogeneous, 
optimization problems over the solutions of the continuity equation with source 
(informally introduced in 1.2) — the dynamic problems — admits unbalanced 
Kantorovich formulations that we introduced in Sect. 3 — the static problems. 
We prove an equivalence result between static and dynamic models for a large 
class of dynamic models. 


4.1 A Family of Dynamic Problems 


Dynamic formulations of unbalanced transportation metrics correspond intu¬ 
itively to the computation of geodesic distances according to a function measuring 
the infinitesimal effort needed for “acting” on a mass m at position x according 
to the speed v and rate of growth a (cf. (1.4)). This should be contrasted with 
the static formulation (3.3) that depends on a cost function c(xo,mo,xi,mi) 
between pairs of positions and masses. 

The continuity equation, informally introduced in (1.2), is a concept underly¬ 
ing all dynamical formulations of this article. It enforces a local mass preservation 
constraint for a density p, a flow field v and a growth rate field a. We now give 
a rigorous definition in terms of measures (p, uj,() where u can informally be 
interpreted as momentum p • v of the flow field and £ corresponds p • a. 

Definition 4.1 (Continuity equation with source). For (a, b) E M 2 and P C M. d 
compact, denote by C£„(po,pi) the affine subset of A4([a,b] x P) x A4([a,b\ x 
Q) d x M([a. 6] x Q) of triplets of measures p, = (p, w, C) satisfying the continuity 
equation dtp + V • oj = £ in the distribution sense, interpolating between po and 
pi and satisfying homogeneous Neumann boundary conditions. More precisely 
we require 


d t tp dp+ 


V <p • dcu + 


n 


(pd( = 


/ <p(6, -)dp! 

Jn 


- / <p(a,-)dp 0 (4.1) 


in 
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for all <p E C 1 ([a, b] x Q). 

Below we collect without proof a few standard facts on this equation which 
will be useful for the following. The notation B d ( 0, r) denotes the open ball of 
radius r in R d centered at the origin. 

Proposition 4.1. 

Glueing Let ( a,b,c ) E R 3 satisfying a < b < c, ( p a ,P&,Pc ) € Ad + (fl) 3 , pa E 
C£ b a (p a , pb) and ps € C£l(pb, p c )- Then the measure p defined as pa for 
t E [a, b[ and ps for t € [6, c] belongs to C£ c a (p ai p c )■ 

Smoothing Let e > 0, let rf, r £ mollifiers supported on the open balls B d ( 0, |) 
and B 1 ( 0, |) respectively and r e : (t,x) i-T r £ (t)r £ (x). Let p = (p, oj,Q 
be a triplet of measures supported on R x SI such that p E C£g(po, Pi), 
p = (po, 0, 0) (8) df for t < 0, and p = (pi, 0, 0) (8) dt for t > 1. Then for all 
a < — e/ 2, and b > 1 + e/2, p*r E E C£ b a (po *rf,p\ *rf) on Q + B d (0, e/2). 

Scaling Let p = (p, u, f) E C£ b a (p a , pb) with a < b and T : ( t , x) e-)- {T t {t) ,T x (x)) 
be an affine scaling with multiplication factor a and f3, respectively. Then 
(a ■ T # p,/3 ■ T # u,T # () E C£:^^ ) ) ((T x ) # (p a ), (T a; ) # (p 6 )) on the domain 
T X (Q). 

Next, we introduce the admissible class of infinitesimal costs which generalizes 
the admissible Riemannian inner products defined in Section 2.2. Although a 
standard Lagrangian cost would be defined as a function of the speed v and 
the growth rate a, our infinitesimal cost has nicer analytical properties when 
defined in terms of the momentum uj and the source f variables. Indeed, it is 
then natural to require subadditivity (i.e. we expect that “locally”, mass is not 
encouraged to split) and homogeneity in (p,w,C) (thus convexity). Finally, this 
change of variables allows interesting costs where w or ( do not necessarily admit 
a density w.r.t. p (see Section 5.1). 

Definition 4.2 (Infinitesimal cost). In the following, an infinitesimal cost is a 
function /AffxMxM^xM—>■ R + U {+oo} such that for all x E fl, f(x, •, •, •) is 
convex, positively 1-homogeneous, lower semicontinuous and satisfies 


f(x,p,uj,() < 


= 0 
> 0 

= Too 


if (cu, £) = (0, 0) and p > 0 
if |w| or |C| > 0 
if p < 0 . 


For Theorem 4.3, we also require (i) that there exists continuous functions 
A, : 12 —>• M!j_, i E {1,..., N} such that 


N 

f(x,p,u,() = ^2\(x)fi(p,u;,() (4.2) 

2=1 


where each fi satisfies an integrability condition: there exists Ci > 0 such that 
| /j(•) | < Ci max.*, f(x, ■) and (ii) a non-degeneracy condition : there exists C > 0 
such that f(x, p, w, 2^) < C ■ f(x, p, w, C), for all (x, p, w, C) € fl x M x R d x M. 
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Note that, in particular, any admissible Riemannian metric in Section 2 satis¬ 
fies these conditions (by equation (2.5)). The dynamic formulation is now defined 
as the computation of a geodesic length for the infinitesimal cost /. 

Definition 4.3 (Dynamic problem). For (p,u,() G Af ([0,1] x D) 1+d+1 , let 

Jd(p,u, C) =' [ f f(x, f, j,§)dA(t,x) (4.3) 

Jo is) 

where A £ A4+([0,1] x 0) is such that (p,u,C) A. Due to 1-homogeneity of / 
this definition does not depend on the choice of A. The dynamic problem is, for 
POi Pi € -M+(D), 


Cd(po,Pi) = inf J D (p,u, C) ■ (4.4) 

(PA>,C)eC£j(p 0 ,pi) 

Similarly to the static case (Theorem 3.5), the dynamic setting also enjoys 
a dual formulation. For the definition of lower semicontinuity for set valued 
functions, see the preliminaries before Theorem 3.5. 

Proposition 4.2 (Duality). Let B(x ) be the polar set of f(x, -, •, •) for all x £ D 
and assume it is a lower semicontinuous set-valued function. Then the minimum 
of (4.4) is attained and it holds 

C D (po,P l) = sup / <p(l,-)dpi- / 9j(0,-)dp 0 (4.5) 

ipG-KT Jfl Jo, 


with K = f ' {<£> £ C' 1 ([0,1] x U) : {dtp, V</?, ip) £ B(x), V(t, x) £ [0,1] x D} . 
Proof. Remark, that (4.5) can be written as 


inf 

■pec 1 ([o,i] xf2) 


F(A<p) + G(p) 


where A : <p i —> (dt<p, V</?, <p), is a bounded linear operator from C 1 ([0,1] X O) 
to C([0,1] x Q) d+2 , and F : {a,/3,~/) i-> /q 1 f n t B(x )(a(t, x), ft(t, x),j(t, x))dxdt, 
G : ip i->- ip(0, -)dpo—J q (p(l, ■)dpi are convex, proper and lower-semicontinuous 
functionals, in particular because for all x £ D, the set B(x ) is convex, closed 
and nonempty. Since we assumed that f(x,p, oo,C) > 0 if |w| > 0 or > 0 and 
/ is continuous as a function of x on the compact D, one can check that there 
exists e > 0 such that (—e,0, de/2) £ ( int B(x)) for 6 £ [—1,1] and thus 

there exists a function <p : 1 1 ——et + e/2 such that F(A(p) + G(<p) < +oo and F 
continuous at A<p. Then, by Fenchel-Rockafellar duality, (4.5) is equal to 


min 

pe.M([o,i]xn ) d + 2 


G* (-A*p) + F* (p). 


By [Roc71, Theorem 6] and the lower-semicontinuity of i £ B{x), we have 
F* = J B , and by direct computations, G* o (—Al*) is the convex indicator of 
C£q{po,Pi)- □ 
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4.2 Connection Between Static and Dynamic Problems 

Theorem 4.3 is the main result of this section. It states that if the cost c of the 
static problem is defined as the dynamic cost between Diracs, then the static 
and the dynamic formulations coincide. The cost c of the static problem can 
alternatively be determined as a generalized action on the paths space as mea¬ 
sured by an infinitesimal cost / (see Proposition 4.4). In the Riemannian setting 
this means that c is the squared geodesic distance on Cone(P), see Sect. 2.2. Of 
course, there are static costs c that do not arise from a dynamic cost /, just as 
there are dynamic problems beyond the framework of Definition 4.2 which cannot 
be cast into a static form. 

Definition 4.4. The Dirac-based cost is 


c d : (x 0 ,m 0 ),(xi,mi) C D (m 0 S xo , mi6 xi ) . (4.6) 

If Cd is l.s.c. then it defines a cost function. 

Definition 4.5. The path-based cost c s is defined by a minimization over smooth 
trajectories 

c s (xo,mo,xi,mi) A = inf f f(x(t),m(t),m(t)x'(t),m'(t))dt (4.7) 

0 x(t),m(t )) J o 

for (x(t),m(t)) G C' 1 ([0, l],fi x [0, +oo[) such that (x(i),m(i)) = (xi,mi) for 
i G {0,1}. In general, c s does not define a cost function in the sense of Definition 
3.1. 

Theorem 4.3. Let Sic M. d be a compact which is star shaped w.r.t. a set of points 
with nonempty interior, and c he a cost function satisfying Cd < c < c s . If the 
associated static problem. Ck is weakly* continuous, it holds, for po, p\ G A4 + (P), 
Cd(pi h Pi) = C K (po, Pi) and c = c d . 

Sufficient conditions on c for the weak* continuity of Ck are given in Theorem 
3.3. The assumptions on the domain include convex sets but also most star¬ 
shaped sets. It is however not our intention to look for the weakest assumptions 
on the domain for our result to hold. In general, computing Cd directly is not 
easy : the margin in the choice of c in Theorem 4.3 allows to obtain Cd as a 
consequence of the Theorem, not as a requirement. A natural choice for c is the 
convex regularization of the cost on the path space c s , which is easier to compute 
than Cd- It can be nicely expressed as the optimal cost on the path space when 
allowing the Dirac to be split into two chunks: 

Proposition 4.4. The convex regularization of c s can be expressed as 

c : (® 0 ,m 0 ), (xi,mi) t-i inf c s (x 0 , mg, x 1 , mf) + c s (x 0 , m b Q , aq, m\) . (4.8) 

m 0+ m 0= m 0 

It is convex, positively homogeneous in (mo, mi), and satisfies Cd < c < c s . 
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Proof. The fact that (4.8) is the convex regularization of c s is a consequence of 
Caratheodory’s Theorem (see [Rocl5, Corollary 17.1.6]). It is clear that Cd < 
c < c s . □ 

Note that in general, Cd c s , since, as opposed to the dynamic problem Cd, 
the problem defining c s is not allowed to split mass. For instance, the cut distance 
is 7 t/ 2 for the WF metric (see Section 5.2) while it is n for the Riemannian metric 
on the cone (see Proposition 2.3). We now proceed to the proof of Theorem 4.3. 

Proof of Theorem f.3. It is clear that Cd < c s because for any candidate path 
(x(t),m(t)) in (4.7), m(t)S x u\<S>dt E C£Q(mo5 xo , miS xl ), and thus the assumption 
Cd<c<c s is not void. The proof is divided into four steps: in Step 1, we show 
that for marginals which are atomic measures, it holds Ck > Cd- By integrating 
characteristics (an argument similar to the original proof of the Benamou-Brenier 
formula [ ]), we show in Step 2 that for absolutely continuous marginals, Ck 

is upper bounded by the dynamic minimization problem restricted to smooth 
fields. In Step 3, a regularization argument (inspired by [Vil03, Theorem 8.1]) 
then extends this result to general measures and for the actual problem Cd- This 
is were the weak* continuity of Ck intervenes. The conclusion, in Step 4, follows 
by a density argument. 

Step 1. Let po> Pi € AT]* (14). For the static problem Ck, there exists a mini¬ 
mize!' in the set r(po, Pi) H A4“ i (H 2 ) 2 . Indeed, an atom assigned to another atom 
is represented by an atom in fl 2 , and the same can be forced to hold for an atom 
assigned to the apex, because for all x E 12, c(x, 1, ■, 0 ) attains its minimum in 
Q. Let ( 70 , 71 ) be such a minimizer, it can be written 7 ^. = j for 

k = 0 , 1 . Since c > Cd, it holds 

Ck(po,Pi) = 

-J2 Cd ( m h 6 xi > ) 

m lo 6x i > ) = c d(po, Pi) 

i,j i,j 

where the last inequality is due to the sub-additivity of Cd, inherited from Jd- 
Step 2. Let po, pi E A45_ c (fl) be marginals with positive mass and let 
{p, W ,C) S |(p, w,C) S C£q(po, pi) : p € C 1 ([ 0 , 1 ] x 12)| . 

One can take the Lagrangian coordinates (^ 7 ( 2 ), \t(x)) which are given by the 
flow of (y =' u/p,a d = C/p) defined as in Proposition 2.7: 

d t (p t (x) = v t (ip t (x)) and d t X t (x) = a t (<pt(x))X t (x ), 
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with the initial condition (<po(x), Xo(x)) = (x, 1). Recall that (</? t (x),A t(x)) de¬ 
scribes the position and the relative increase of mass at time i of a particle 
initially at position x and that one has pt = (<■ Po)- It follows, 


Jd(p,u, C) = / f(x,l,v t (x),a t (x))d[(<pt)*(\tPo)](x)dt 

J fO.llxH 


'[o,i]xn 

r /•! 


[ f f(Lpt(x),l,d t Lpt(x),d t \t(x)/\t(x))\t(x)dt 
Jn Uo 


n Uo 


f{<Pt(x ), Ai(x), \ t (x)(d t <pt(x)), d t X t (x))dt 


0 

(J 

(3) r 

> / c(x, l,v?i(x), Ai(x))dp 0 (x) 
Jn 

(4) 

> Ck(po,Pi) 


dp 0 (x) 

dp 0 (x) 


where we used (1) the change of variables formula (2) homogeneity of / (3) the 
assumption c < c s and (4) the fact that ((id, <pi)#po, (id, <^i)#(Aipo)) £ T(/9g, Pi)- 


Step 3. Let />0)Pi £ A4_|_(fl). We want to show, with the help of Step 2, that 
Ck(po,Pi) < C D (p 0 ,pi). Let (p,u, () <E C£l(p 0l pi) and for 6 e]0,1[ let 

p 5 = (1 — 5)p + 5 (dx <g> df), io 5 = (1 — 6)uj, ( s = (1 — 6)( 

so that p s is always positive on sets with nonempty interior and (p s ,uj s ,£ 3 ) € 
C£o(Po’Pi)- By convexity, 

J D {p 5 ,u 5 ,( 3 ) < J D (p,u,C ) • 

Since pg po and p\ -±* p\ as 5 —>• 0 and Ck is continuous for the weak* topol¬ 
ogy, it is sufficient to prove Jf)(p s ,u s ,( s ) > Ck(Pq,p\) for proving Jjy(p,u},Q > 

Ck(po,Pi)- 

In order to alleviate notations we shall now denote p 3 ,oj 3 ,C, S by j us t , C- 
Also, we denote by B d (a,r ) the open ball of radius r centered at point a in 
Up to a translation, we can assume that 0 is in the interior of the set of points 
w.r.t. which fl is star shaped. Then [?, Theorem 5.3] tells us that the Minkowski 
gauge x i —y inf{A > 0 : x € AU} is Lipschitz; let us denote by k € its Lipschitz 
constant. We introduce the regularizing kernel r e (t, x) = \ n (-) i r 2 (-) where 
n € C™(B d ( 0, ^)), r 2 € C%°(B 1 ( 0, ^)), n > 0, j r, : = 1, n even (i = 1,2). Let 
p = ((1 + e)~ l p , (1 + e) _1 w,C) where p is a measure on [— e, 1 + e\ x U which 
is worth p on [0, 1] x 0, po <g) dt on [—e, 0[xfl and pi <g> di on ]1,1 + e] x U. 
By the glueing property in Proposition 4.1, (p,ui,Q still satisfies the continuity 
equation. Then define 

p £ =' T # (p*r £ ) | [0)1]x n, 

where T : (t,x) i->- ((1 + e) _1 (t + e/2), (1 + e)~ 1 x) is built in such a way that 
the image of the time segment [—e/2,1 + e/2] is [0,1]. Furthermore, since the 
Minkowski gauge of U is fc-Lipschitz, the image of IT =' U + B d (0,s/k ) by T is 
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included in 17. Now, by the smoothing and scaling properties in Proposition 4.1, 
it holds p £ € C£q(pq, p \), in particular because we took care of multiplying p and 
a; by a factor (1 + e) _1 in the definition of p. Notice that p q po and p\ —>■* p\ 

when e —>• 0 since they are the evaluations of p * r e at time —e/2 and 1 + e/2, 
respectively (also contracted in space by a factor 1 + e). Moreover, the vector 
fields wf/pf and Ct/Pt are well-defined, smooth, bounded functions on [0,1] x 17. 
Therefore, by Step 2, 

Jd(p £ ) > C K (Po,Pi)- 

On the other hand, for any e' > 0, one has 
Jd{p £ )=[ f f(y, ^)d\p E \(s,y) 

J 0 J 

-J 2 _/((! - £ )y> J0r-\) d \P* r e\(s,y) 

( 2 ) rl+e/2 f f 

< / d\p\(t,x) dsdyf((l-£)y,Sr(t,x))-r e (s-t,y-x) 

J —e/2 ./fie 

-Y f [ d\p\{t,x) [ dsdyfi(-Mr{t,x))Xi((l-£)y)r £ (s-t,y-x) 

i Jo Jil J R!+ d lMl 

< f j (l + e?)f(x,-fe)d\p,\(t,x) 

•J 0 J 

where were used (1) the change of variable formula, (2) the convexity and ho¬ 
mogeneity of /, (3) the multiplicative dependance in x (with the integrability 
conditions) assumed on / and (4) the continuity of the strictly positive fac¬ 
tors (A i)i, where e, chosen small enough, depends on e'. Therefore Jd(p) > 
Jd( p £ )(l + e') -1 - But by convexity and homogeneity, Jo(fi) < (1 + e)^ 1 ((l — 
£)Jn(p,uj,C) + sJd(p,w, 20). The term Jd(p,lu, 20 is finite if Jd(p,uj,C) is 
hnite (by our assumption on /) and one has 

Ck(Po,P £ i) < y^j((1 - e)J D (p,u,t) + £Jd(p,w,2C)) • 

Letting e' and £ go to 0, using the continuity of Ck under weak* convergence and 
taking the inhmum, one recovers in the end Ck(po, Pi) < C0(p, w, 0 as desired. 

Step 4. We have proven in Step 1 that if po, Pi € At“*(17), then 

Cd(po,Pi) < Ck(po,Pi) ■ 

Moreover, by Step 3, for all po,pi € A4+(17) on has 

Cd{po,Pi) > Ck(po,Pi) ■ 

and thus Cd = Ck for atomic measures. But Cd is weakly* l.s.c since Jd is 
l.s.c. by Reshetnyak lower-semicontinuity (which requires to integrate on an open 
set, but one can bring ourselves back to that case as in Proposition 3.1). Finally, 
by density of Mf (17) in A4+(17), for p 0 ,pi € 2W+(17), C D (po,Pi) < C K (po,Pi)- 
The equality c = Cd is direct by computing Ck between Dirac measures, and 
because c is subadditive. □ 
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5 Examples 


In this section we discuss some examples which fit into the framework developed 
in sections 3 and 4. Optimal partial transport is first treated and then our initial 
motivating example: the Wasserstein-Fisher-Rao metric. In this section, 17 is a 
convex compact set in R d . 


5.1 An Optimal Partial Transport Problem 

We consider an optimal transport problem with relaxed marginal constraints, 
which for po,Pi € A4+(P) and p £ N, p > 2, consists in solving 

inf -W p (p 0 ,pi) + 6 ■ (\po ~ Po\tv + \pi - Pi\tv) ■ (5.1) 

Po ipi p 

Proposition 5.1. The value of the infimum is left unchanged when adding the 
constraints pi < pi (i = 0,1). 

Proof. Let 7 € At + (17 2 ) be any coupling between po and p\ and let 7 * € A4+(P 2 ) 
be such that 7 * < 7 and (Projo)# 7 * = po A (Proj 0 )# 7 . By construction, one has 

\Po ~ (Proj 0 ) # 7 | - \po - (Proj 0 ) # 7 *| =|(Proj 0 ) # 7 - (Proj 0 ) # 7 *| = | 7 - 7 *|, 

\Pi ~ (Proj 1 ) # 7 | - \pi - (Proj 1 ) # 7 *| > - |(Proj 1 ) # 7 - (Proj 1 ) # 7 *| = -| 7 - 7*1- 

By denoting F the functional in (5.1) written as a function of a coupling, it holds 
F( 7 )-F( 7 *)> [ (\y — x| p /p)d (7 — 7 *) > 0 . 

Jn 2 

A similar truncation procedure for the other marginal leads to the result. □ 

Problem (5.1) is similar to the distances introduced in [ | and [ 1 ] 

defined as the p-th root of 


inf W p (po, pi) + S(\p 0 - pq\tv + \pi ~ Pi\tv) (5-2) 

P0,P1 


with the difference that the problem we consider is invariant under mass rescaling. 
The link between this problem and the optimal partial transport problem, i.e. 
an optimal transport problem where one chooses the amount of mass which is 
transported, was mentioned in [ ] and is recalled below. Note that what 

follows is also true for to the case p = 1 and more general costs on P 2 , up to 
a slight adaptation of the duality formulas. The next proposition states that 
problem (5.1) fits into our framework if we define the cost function as 


c(xo,mo,xi,mi) = min 


xi — Xq\ p 
p 



■ min(?no, mi) + 5\rri\ — mo 


(5-3) 


which is l.s.c. and jointly sublinear in the variables (mo, mi). 
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Proposition 5.2 (Link to our framework and optimal partial transport). Let 
Ck be the static cost defined as in (3.3), with the cost function (5.3). Then it is 
equal to (5.1) and one has 


Ck(po,Pi) — b(po(ty + Pi (ty) 


inf f (|xi - x 0 \ p /p - 25) d'y (5.4) 

7er<(p 0 ,pi) Jn 2 


where r<(po, p\) is the subset of M + (Ll 2 ) such that the first and second marginals 
are upper bounded by po and p\, respectively. 

Rem.ark. The right-hand side of (5.4) is the Lagrangian formulation of the optimal 
partial transport problem. In this formulation, one replaces the amount of mass 
m to be transported by a Lagrangian multiplier, which corresponds to 26. It is 
clear that our problem computes an optimal partial transport for some m (the 
total mass of the optimal 7 ) but in general, all values of m cannot be recovered 
by making 5 vary (think of atomic measures). This is however the case under 
the assumptions of [CM10, Corollary 2.11]. 

Proof. Let us denote by C par the infimum in the right-hand side of (5.4). For 
Po,Pi £ A 4 + (n) and any semi-couplings ( 70 , 71 ) £ r(po,/Oi), let mo, mi be 
the densities of 70 , 71 w.r.t. some 7 £ _M + (fl 2 ) with 70 , 71 <C 7 . Introduce 
7 = min(mo,mi) ■ i\d with D = {(xo,xi) £ D 2 : |xi — xo| p /p < 26}. It holds 
7 € T<(po,/3i) and 


<^<r(7o,7i)= / c(x 0 ,m 0 ,xi,mi)d 7 (xo,xi) 

Jn 2 

-|xi - x 0 | p d 7 + 6 (d| 7 o - 7 I + d l7i ~ 7l) 

P Jn 2 


Jn 2 P 

= < 5 |po|ry + $\pi\tv + 


1 

In 2 \P 


xi — xq| p — 26 ) d7. 


This implies Ck~ &{\po\tv + \pi\tv) > C par . The opposite inequality conies from 
the remark that the infimum defining C par is unchanged by adding the constraint 
supp( 7 ) C D. Let 7 £ T<(/ 9 o,Pi) be supported on D and define for i £ {0,1} 
Pi = Pi ~ (ProjJ #7 £ A4_|_(fl). Let us build a couple of semi-couplings 


7i = 7 + diag # (^ 0 A ^ 1 ) + diag # {pi - p 0 A pi), i £ {0,1}, 

where diag : x i->- (x, x) lifts Q to the diagonal in Q 2 . Decomposed this way, one 
obtains directly 


<^(70,71) = $\po\tv + 5 \pi\tv + [ (-|xi - x 0 | p - 2 J ) d7. 

Jn 2 \P J 


Hence C K ~ 6(\p 0 \ T v + \pi\tv) < C par . Finally, one has C par + 6(\p 0 \ T v + 
\Pi\tv) = inf (5.1) directly, by applying Proposition 5.1 and rewriting (5.1) as a 
minimization on a variable 7 £ T<(po , Pi)- Cl 
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Now consider the infinitesimal cost 


/ : (p,w, C) ^ < 


i^+m 

<S|CI 

+oo 


if p > 0 
if p = |w| = 0 
otherwise . 


(5.5) 


which satisfies the conditions of Definition 4.2 (it is implicitly independent of the 
space variable x) . 


Proposition 5.3 (Cost on the path space). The static and dynamic costs defined 
in (5.3) and (5.5) are related by 


c(xo,mo,xi,mi) = inf / f(m(t),m(t)x , (i),m , (t))dt (5.6) 

Jo 

for G C' 1 ([0, l],Hx [0, +oo[), x(i) = Xi and m(i) = mi, for i = 0,1. 

Proof. First, for any C 1 path (x,m) we denote by m = min te [ 0 ,i] m {t) its mini¬ 
mum mass. It holds 

[ f(m(t),m(t)x'(t),m'(t))dt = f -\x'(t)\ p m(t)dt + 6 f |m / (t)|dt 

Jo Jo V Jo 

r 1 i 

>m -|x'(t)| p dt + 5(\mo — ml + \m± — m\) 
Jo P 

> c(x 0 ,m 0 ,xi,mi). 


For the opposite inequality, let us build a minimizing sequence. The infimum 
is clearly left unchanged when one considers piecewise C 1 trajectories. In the 
case where |xo — x\\ p /p < 26, we divide the time interval into three segments 
[0, e], [e, 1 — e] and [1 — e, 1] and build a piecewise C 1 trajectory by making pure 
variation of mass (or staying on place) in the first and the third segment, and 
constant speed transport of the mass m = min(mo,mi) during the second seg¬ 
ment. This way, we obtain that the right-hand side of (5.6) is upper bounded 
by 


mi — mo| + linr 
£—^0 



-\x'(t)\ p mdt 

P 


c(x 0 ,m 0 ,xi,mi). 


In the case \xo~xi\ p /p > 26, one obtain the same inequality by building a similar 
path, but by transporting only an amount e of mass in the second segment. □ 


Theorem 5.4. The equivalence between the static and the dynamic formulations 
Cd = Ck holds and Cj/ P defines a distance on A4+(D) which is continuous 
under weak* convergence. We have the dual formulation 


Ck{po,Pi) 


= sup 

(MOec(n) 2 


/ </>dp 0 + / 
J J Q 


Y’dpi 


subject to, for all (x,y) G D 2 , cf(x) + if(y) < ||y — x\ p and (f>(x),if(y) < 6. 
Equivalently, 


Ck(po,Pi)= sup 

(pec^do.ijxn) 


/ <p0-r)dpi ~ / ^(0,-)dp 0 , 

Jfl JQ 
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subject to \<p\ < 5 and dtp + V(/j| p- 1 < 0 . 

Proof. We first prove that cp defines a metric on Cone(D). This is mainly a 
consequence of the inequality 

(a + c) p + b + d < (V + b)p + (c? + d)p^ 

which holds true for (a, b , c, d ) € and p E N* (this becomes clear when the 
right-hand term is expanded). Thus, if we take A = (xo,mo), B = (aq,mi) 
and C = (. X 2 ,rri 2 ), three points in Q x [ 0 , +oo[ satisfying j/x -2 — xo| p < 2(5 (the 
other case is easy) and, without loss of generality, 1 = mo < m2. Remark that 
there always exists B = (ii,mi) satisfying |mo — xi\ p < 2p<5, |.T2 — £i| p < 2p<5, 
m-o < mi < m2 and c(A,B) > c(A,B), c(C,B) > c{C,B). We drop the 1/p 
factor for clarity and obtain 

c(A, C) < ( \x2 — xi\ + \x\ — s 0 |) p + {\ m 2 — hii| + |mi — m 0 |) 

< ^(|x 2 - Xi\ p + |m 2 - mi|)p + (|xi — x 0 \ p + |mi - m 0 |)p^ 

< [c{A,B)p + c{B,C)p^j < (c(A, B)p + c(B, C)p^j . 

Thus c satishes all conditions of Theorem 3.3. This implies that Ck is continuous 
in the weak* topology and thus Theorem 4.3 applies. The duality results are 
consequences of Proposition 4.2, Theorem 3.5 and direct calculations. □ 


5.2 A Static Formulation for WF 


Definition 5.1 (The WF distance [ SPV15, KMV15]). For a parameter 6 € 
]0,+oo[ consider the convex, positively homogeneous, l.s.c. function 


/ : R x x M 9 (p, lo, i->- 


1 |oj| 2 +<5 2 C 2 

2 p 
< 0 


^+00 


if p > 0 , 

if (P,w, C) = ( 0 , 0 , 0 ), 

otherwise, 


(5.7) 


and define, for po,p\ € A4+(fl), 


WF(p 0lPl ) 2 ^ 


inf 

(p,^,C)eC£j(p 0 ,pi) 



dA 


(5.8) 


where A € A4 + ([0,1] xfi) chosen such that p, uj, f -C A. Due to the 1-homogeneity 
of / the integral does not depend on the choice of A. 


We now show that WF admits a static formulation, which belongs to the 
class of models introduced in Section 3. First, it is clear that WF fits into the 
previous framework if we choose the cost function c to be WF(mo5 Xo , m\8 Xl ) 2 . 
This distance has been computed in [ ] and is given by 


c(x 0 ,m 0 ,xi,mi) 


2 ( 5 " 


(m 0 + mi - 2^/mo mi • cos(|x 0 - aq|/(2<5)). 


(5.9) 
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where cos : z i->- cos(|£| A j(). Remark that c is a cost function and that its square 
root defines a distance on X M + (where we identify points with zero mass 
fi x {0}) since it is derived from the WF distance restricted to Dirac measures. 
It is direct to see that c satisfies the assumptions of Theorem 3.3, for p = 2. 

Theorem 5.5 (Continuous static formulation). Choosing the cost function (5.9), 
it holds 

WF 2 (p 0 ,p 1 ) = min Ja: ( 70,7i)- (5-10) 

(7o,7i)er(p 0 ,pi) 

Proof. This is a particular case of Theorem 4.3. □ 

Rem.ark. This theorem can be reformulated, in a nutshell, as 

7^2 ILF 2 ( p 0 , Pi ) = \po\tv + \pi\tv+ 

inf -2 [ cos(|y - x|/(25))d( v '7o7i)(x, y ). 
(7o,7i)^r(p 0 ,pi) J\y—x\<n 

where ^7071 = 2 7 for any 7 such that 70,71 < 7. 

Corollary 5.6. It holds 

^WF 2 (p 0 ,p 1 ) = sup [ cf>(x)dp 0 + [ ip(y)dpi 

zo w>,i/>)ec(n ) 2 Jn Jn 

subject to, V(x,y) € 14 2 : <t>{x) < 1, V’(y) ^ 1, 

(1 - 1 - -if{y)) > cos 2 (\x - y\/(26)) . 

Proof. By direct computations we find that c(x, -,y, •) = l*q^ x ^ with 

Q(x, y) = {(a, b) € M 2 : a, b < 1 and (1 — a)(l — b) > cos 2 (| y — x\/{25 ))} 
and apply Theorem 3.5. □ 


5.3 T-convergence of Static WF 

In [ ] the limit of the growth penalty parameter 6 —>• 00 of WF is studied 

and related to classical optimal transport. Here we give the corresponding result 
for the static problems in terms of T-convergence [Bra02]. Recall that this implies 
both convergence of the optimal values as well as convergence of minimizers. We 
now denote by cs the cost defined in (5.9) to emphasize its dependency on 5. 

Theorem 5.7 ((Almost) Classical OT as Limit of WF). Consider the following 
two generalized static optimal transport problems: 

M 7o,7i)= / c 5 (x,y,7o(.x,y),7i(x,y)) dx dy - 25 2 (a/7o(^ 2 ) - a/7i(^ 2 )) 2 
Jn 2 


^00(70,71) 


'0 

( f n 2 \x — y| 2 d 7 o(x, y) ■ ^ 


(5.11) 

if 1 o = 0 or 71 = 0 , 

if 71 = 070 for some a > 0 , (5.12) 

otherwise. 
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Then J$ T-converges to as 5 —>• oo. 

Remark. One has lim^oo WF(po, pi) = oo if po(£l) / pi(fi). Consequently, to 
properly study the limit, we subtract the diverging terms in (5.11). Conversely, 
we slightly modify the classical OT functional, to assign finite cost when the 
two couplings are strict multiples of each other. The corresponding optimization 
problem is solved by computing the optimal transport plan between normalized 
marginals and then multiplying by the geometric mean of the marginal masses. 

The proof uses the following Lemma. 

Lemma 5.8 (Sqrt-Measure). Let A C M n be a compact set. The function 

■M+(A) 2 3 P ^ y ~Vhi ' T2(A) (5.13) 


is weakly* l.s.c. and bounded from below by \Jp\{A) ■ \Jp 2 (A). The lower bound 
is only obtained, if p i = 0 or p 2 = 0 or p\ = a ■ p 2 f or some a > 0. 

Proof. With f(x) = (y/xi — sjxf) 2 /2 and a reference measure v € M+{A) with 
^<Ci/we can write 

—V hi ■ T2{A) = J r f (J^j dv - pi(A)/2 - p 2 {A)/2 

Since / is 1-homogeneous, the evaluation does not depend on the choice of u. As 
/ is convex, l.s.c., bounded from below, A is bounded, and total masses converge, 
lower semi-continuity of the functional now follows from [ FP00, Thm. 2.38] (see 
proof of Proposition 3.1 for adaption to Ll closed). 

For the lower bound, let p± = A • p 2 + Pi l. be the Radon-Nikodym decompo¬ 
sition of pi w.r.t. p 2 . Then have 

~VTi ■ h2(A) = - j V\dp 2 > - (^J Ad p 2 ■ p 2 (A)\ > - (pi(A) ■ p 2 (A)) 1/2 

where the first inequality is due to Jensen’s inequality, with equality only if A is 
constant. The second inequality is only an equality if P\x = 0. □ 


Proof of Theorem 5.7. Lim-Sup. For every pair (70,71) a recovery sequence is 
given by the constant sequence ( 7 o> 7 i)neN- The cases 7 * = 0 for i = 0 or 1, and 
71/070 for every a > 0 are trivial. Therefore, let now 71 = 070 for some 
a > 0. We find 


<^( 70 ,«7o) = / 2J 2 [(l + a)-2/acos(|x-?/|/(2J))] d 7 o(.x,y) 
Jn 2 

- 2 J 2 7o(n 2 ) (1 - yfa) 2 


Now use cos(z) > 1 — z 2 / 2 to find: 

< [ 4d 2 y/a ^ X ^ d7o(.x, y) 

Jn 2 8 6 2 

= Too(7o,7i) 
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Lim-Inf. For a sequence of couplings (7o,fc, 7i,fc)fceN converging weakly* to 
some pair (70,00,71,00) we now study the sequence of values 4(7o,fc>7i,fc)- Note 
first, that J^ is weakly* l.s.c. since the integral part is l.s.c. (cf. Proposition 3.1) 
and the second term is continuous (total masses converge). 

Since 17 is compact, there is some iVi £ N such that for k > N\, we have 

1 — z 2 j 2 < cos(z) < 1 — z 2 1 2 + z 4 /24 for z = \x — y\/{2 k), x, y € 17. 

And therefore for k > N\ and any coupling pair, denoting A == ^70 (17 2 ) — 

y/im, 

<4(70,71) = 2 k 2 (^j^ 2 (Vlo( x ^y) - Vli( x ^y)) dxdy — A 2 

r \x — y\ 2 

+ 4fc 2 ] 8k2 V'yo(x,y)'yi(x,y)dxdy 

-I-^k 2 L* 24- (2^1) 4 V / ^o(g,y)7i(g,Z/)dgdy 

for some I € [0,1]. 

Since 17 is bounded, by means of Lemma 5.8 and since the total masses of 
7 i ki * = 0,1 are converging towards the total masses of 7^00 as k —>• 00, there 
is a constant C > 0 and some N 2 > TV] such that the coefficient for / in the 
third line can be bounded by C/k 2 for k > N 2 for when calling with arguments 
(7o,fc) 7i,fc)- 

For the first line we write briefly 2 k 2 F{^ 0,7i)- From Lemma 5.8 we find that 
F is weakly* l.s.c., F > 0 and ^(70,71) = 0 if and only if (70,71) £ S with 

S = {(70,71) £ Al+(n 2 ) 2 : 70 = 0 or 71 = 0 or 71 = a • 70 for some a > 0} . 

It follows that for N 2 < fcq < A7 one has 

42 ( 70 ,^ 2 ,7i,fc 2 ) 4i ( 70 ,fc 2 ,7i,fc 2 ) "f 2 (fej A 7 ) 7^(7o,fe 2 > Ti,fc 2 ) "F L • C/fcq 

for some / £ [—1,1], Now consider the joint limit: 

liminf 4(7o,fc, 7i,fc) > hminf 4i(7o,fc,7i,fc) + lim inf 2 (fc 2 - fc 2 ) F{ 7o,fe,7i,fc) - C'/fcf 

K—> OO Al—^OO > OO 

> 4 i(70,00, 7 i,oo) + 2 (fc 2 - fc 2 ) l 7 (70,00,71,00) - C/fc 2 

for any A2 < fc’i < fc2 (by using weak* l.s.c. of and i 7 and non-negativity of 
i 7 ). Since Jfcj > —00 and k 2 can be chosen arbitrarily large, we find 

lim inf 4(7o,fc>7i,fc) = °o for (70,00 3 ' 7 l,Oo) ^ . 

Aj—^00 

By reasoning analogous to the lim-sup case (adding the z 4 term in the cos- 
expansion to get a lower bound and bounding its value as above) we find: 

lim inf 4(7o,fc,7i,fc) > 4o(7o ,00 3 Tl ,00 ) for (70,00 , 7 i,oo) € S . 
fc—>00 

□ 
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Conclusion and Perspectives 


In this paper, we presented a unified treatment of the unbalanced transport that 
allows for both statical and dynamical formulations. Our key findings are (i) a 
Riemannian submersion from a semi-direct product of groups with an L 2 metric 
to the WF metric, which leads to the computation of the sectional curvature and 
a Monge formulation, (ii) a new class of static optimal transport formulations 
involving semi-couplings, (iii) an equivalence between these static formulations 
and a class of dynamic formulations. Each of these contributions is of indepen¬ 
dent interest, but the synergy between the static, the dynamic and the Monge 
problems allows to get a clear picture of the unbalanced transportation problem. 
Beside these theoretical advances, we believe that a key aspect of this work is 
that the proposed static formulation opens the door to a new class of numerical 
solvers for unbalanced optimal transport. These solvers should leverage the spe¬ 
cific structure of the cost c considered for each application, a striking example 
being the WF cost (5.9). 
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A Wasserstein-Fisher-Rao as a Weak Metric 

In this appendix, we show that the Wasserstein-Fisher-Rao is a weak metric in 
a Sobolev setting but does not admit a Levi-Civita connection. Similar results 
certainly hold for the Wasserstein metric. 

We will work in a smooth Sobolev setting, namely on Dens 5 (17) the space of 
C 1 positive functions on 17 that are in PP(f7,M) for s > d/2 + 1 where d is the 
dimension of the ambient space of 17. It is an open subset of H s {i 7,M). Note 
that the same results probably hold for the Wasserstein metric. 

Proposition A.l. The WF metric is a weak Riemannian metric on Dens 5 (17). 

Proof. We use the fact that Dens s (I7) is an open subset of the Hilbert space 
id 5 (I7,M) to work in this coordinate system. The tangent space of Dens 5 (17) is 
Dens 5 (f7) x iL 5 (I7,M). Let X € id 5 (I7,M) be a function that will be seen as a 
tangent vector at any density p G Dens 5 (17). We denote by WF(p)(X , X) the WF 
metric evaluated at the point p on the tangent vector X. We have to prove that 
the map from Dens 5 (17) x H s (i 7) into M defined by 

(p,X)^WF(p)(X,X) 
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is smooth. Recall that, using the formulation (2.12), WF(p)(X, X) is given by 

WF(p)(X,X) = l -(L(p)~\X),X) L 2 m . (A.l) 

where L{p) : H s+1 ( 0) i-> H s ~ l ( 0) is the elliptic operator defined by 

L(p)(0) = -V-(pV<(>) + p</>. (A.2) 

Therefore the smoothness of WF(p)(X,X) reduces to the smoothness of L(p)~ 1 
(defined with homogeneous Neumann boundary conditions) as an operator from 
LP _1 into H s+1 with respect to p. Since L(p) is linear in p and using the inverse 
function theorem on Hilbert manifolds, we get the result for the LP _1 topology 
in the second variable X, which is even stronger than the desired result. □ 

Following [ \1MM13], we show the non existence of the Levi-Civita connection 
for the WF metric in the Sobolev setting. 

Proposition A.2. The Levi-Civita connection associated with the WF metric 
does not exist on Dens s (H). 

Proof. From [ 3, page 8], there exists a Levi-Civita associated with a weak 

Riemannian metric if and only if the metric itself admits gradients with respect 
to itself in both variables. Let (p, X) € Dens s (fi) X LP(fi) be an element of the 
tangent space. The differentiation with respect to p of WF(p)(X, X) gives the 
following L 2 gradient in the direction Y € H s (Fl,M): 

d p WF(p)(X,X)(Y) = i(H 2 + |V0| 2 ,Y) LHn) , (A.3) 

where <f) = L(p)~ 1 (X). 

The gradient with respect to the WF metric is then defined as L(p)(Z) where 
Z d = ^(|</>| 2 + |V(^>| 2 ). However, Z € H s (Ll) and therefore L(p)(Z) € LP -2 . 
Thus, the gradient with respect to p does not belong in general to H s . Thus, the 
Levi-Civita does not exist. □ 

Note that the key point lies in the loss of smoothness when applying the 
elliptic operator. This negative result only means that in this H s topology, the 
weak Riemannian metric WF does not admit a Levi-Civita connection. However, 
this result does not preclude the existence of a topology for which the Levi-Civita 
connection exists. 


B Proof of Proposition 2.5 

Proof. For given positive functions a, A on O, one has 

<f >*(mg + = m(g + c(dA) 2 ) + 2cAdAdm + — A 2 dm 2 . (B.l) 

m m 
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Using that h(x,m ) = mh(x ) + a(x)dm + 6 (x)-^-, then, the result is satisfied if 
and only if the following system has a solution 


cd(A 2 ) = a 
cA 2 = b. 


(B.2) 


Therefore, since c, A, b are positive functions, dividing the first equation by the 
second gives: 

d(A 2 ) a 
A 2 = b' 

This equation has a solution if and only if | is exact. Then, c can then be 
deduced using the second equation of the system. 

The last point consists in proving that g = h — c(dA ) 2 is a metric on fi. 
Using the relation in system (B.2), we get c(dA ) 2 = |r. Let us consider (v x , v rn ) £ 
T( X TO )(MxM^_) for a non-zero vector v x , then mh(x)(v x ,v x )-\-a(x)(v x )v m -\-^^-vf n 
is a polynomial function in v rn whose discriminant is necessarily strictly negative 
since (v x , v m ) 7 ^ 0 for all v rn . Therefore, we obtain 

a(x)(v x ) 2 < Ab(x)h{x){v x ,v x ) (B.3) 

which gives h(x)(v x ,v x ) — c(x)d\(x)(v x ) 2 > 0 . □ 
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